BioC2024

Engineering Foundation Models of Single-cell Transcriptomics Data
07-26, 11:15–11:23 (US/Eastern), Tomatis Auditorium

Foundation models of single-cell transcriptomics data promise unprecedented capability to reason about any new information in the context of a vast knowledge about gene co-expression and regulation patterns. While current work in this space largely uses transformer- or GPT-based architectures, we are building multimodal generative architectures that rely on mixture-of-experts variational autoencoders with generate adversarial feedback. Our approaches permit scalable integration of transcriptomes across any supervised metadata labels (e.g. organ, disease, assay, dataset id, etc.) and modalities (e.g. RNA, ADT, ATAC), but also across species without making any a priori assumptions about gene homology. Here we present our work to date on these new architectures involving models trained on ~40 million single-cell transcriptomes from human and mouse in the Chan Zuckerberg Initiative CellCensus, and also zebrafish data from several other datasets. These foundation models open new opportunities for highly-powered prediction tasks even on small sample sizes by leveraging a vast knowledge of transcriptomics.

Zach DeBruine is Assistant Professor of Computing at Grand Valley State University, and Research Scientist in the GVSU Applied Computing Institute. He completed his Ph.D. and postdoc studies at Van Andel Institute at first in structural biology, then bioinformatics, and finally high-performance machine learning. His lab currently is building large multimodal foundation models on genomics and biobank data.

This speaker also appears in: