BioC2024

Scalable count-based models for unsupervised detection of spatially variable genes
07-25, 11:35–11:43 (US/Eastern), Tomatis Auditorium

Unsupervised feature selection methods are well sought after in the analysis of high-dimensional genomics data. The recent development of spatially resolved technologies poses novel computational challenges, including the identification and ranking of genes that vary spatially, i.e. spatially variable genes (SVG). While many SVG methods have been proposed to model continuous normalized gene expression data, they are susceptible to any bias attributed to normalization strategies and vulnerable to the violation of isotropic assumption, leading to erroneous findings. While few available count-based SVG methods are theoretically sound, they are computationally prohibitive and less palatable for real-world application. To address these challenges, we propose a scalable approach that extends the generalized geoadditive framework to the analysis of spatially resolved transcriptomics data. Our method identifies genes whose expression exhibits spatial patterns and accounts for effect differences across pre-defined spatial domains when applicable. In addition, our method provides flexibility in modeling raw gene expression data, accommodating multiple count-based distributions including Poisson, Negative Binomial and Tweedie. In simulation studies and real-world applications, we demonstrate that our proposed count-based models outperform the state-of-the-art SVG methods.

Boyi Guo is an applied statistician and biomedical data scientist working at the intersection of machine learning, computational omics, and population health. His research concentrates on developing statistically rigorous and computationally scalable machine-learning methods, as well as open-source software, that integrates population-scale multi-omics data to uncover functional mechanisms that explain disease heterogeneity.