BioC2024

scDiagnostics: diagnostic functions to assess the quality of cell type annotations in single-cell RNA-seq data
07-26, 16:00–16:45 (US/Eastern), Tomatis Auditorium

Annotation transfer from a reference dataset for the cell type annotation of a new query single-cell RNA-sequencing (scRNA-seq) experiment has become an integral component of the typical analysis workflow. The approach provides a fast, automated, and reproducible alternative to the manual annotation of cell clusters based on marker gene expression. However, dataset imbalance and undiagnosed incompatibilities between query and reference dataset can lead to erroneous annotation and distort downstream applications. We present scDiagnostics, an R/Bioconductor package for the systematic evaluation of cell type assignments in scRNA-seq data. scDiagnostics offers a suite of diagnostic functions to assess whether both (query and reference) datasets are aligned, ensuring that annotations can be transferred reliably. scDiagnostics also provides functionality to assess annotation ambiguity, cluster heterogeneity, and marker gene alignment. The implemented functionality helps researchers to determine how accurately cells from a new scRNA-seq experiment can be assigned to known cell types.

Availability: The scDiagnostics package is available from GitHub under https://github.com/ccb-hms/scDiagnostics. A Bioconductor submission is currently in preparation and is planned for May 2024. This timeline should provide sufficient time for package review and inclusion in Bioconductor prior to the conference.

See also: Package Github Repository (link to pkgdown page will soon be available on main page of repository)

Dr. Anthony Christidis is a Computational Scientist at the Center for Computational Biomedicine at Harvard Medical School where he is a member of multiple research teams. Originally from Canada, he earned his PhD in Statistical Machine Learning from the University of British Columbia (UBC) and a MSc in the same field from the University of Toronto. During his doctoral studies, he developed a new ensemble learning framework to model high-dimensional data which resulted in multiple publications in computational statistics journals. Following his PhD, Dr. Christidis was a Postdoctoral Research Fellow in the Department of Statistics at UBC where he developed new robust computational methods for the analysis of multi-omics data. He has also taught undergraduate and graduate courses in probability, statistics, data science and signal processing at UBC. Dr. Christidis regularly publishes software libraries implementing the statistical and computational methods he develops, and he has held various software development jobs in research institutes and in collaboration with the private sector. His research interests include machine learning, optimization, scientific computing, and the application of computational methods to single-cell and RNA-seq data.