07-24, 14:35–14:43 (US/Eastern), Tomatis Auditorium
PathSeeker is a statistical R package, tailored to improve pathogen identification and characterization from RNA sequencing (RNA-seq) data. Addressing challenges in datasets with numerous potential organism identifications, it leverages summary statistics, topic models, regressions, ranks tests and visualization techniques for precise organisms detection and identification.
PathSeeker uses the Chan Zuckerberg ID (CZID) pipeline for initial organism identification, followed by an intricate process involving filtrations and statistical modeling. This is based on blank samples and initial library quantity, using a regression curve to differentiate true organisms from contaminants. A feature of PathSeeker is its delta approach, employed to minimize cross-contamination errors. This is supplemented by user-defined filters, allowing for positive sample identification based on exceeding median levels in blank controls or meeting specific statistical benchmarks set by the user.
Incorporating topic models, PathSeeker establishes baselines of pathogen expression across samples to discern true pathogenic signals in scenarios lacking controls. This enhances PathSeeker’s ability to identify authentic pathogenic signals in complex datasets.
Furthermore, PathSeeker encompasses an extensive suite of differential abundance analysis tools, adept at handling diverse data distributions and experimental conditions. It incorporates models based on negative binomial and log-normal assumptions, crucial for analyzing data with varying levels of dispersion and normality. For pairwise comparisons, the package employs the Wilcoxon test, ensuring robust analysis of two-condition scenarios. In more complex experimental designs involving three or more conditions, PathSeeker utilizes the Kruskal-Wallis test, adept at managing multi-group comparisons. This comprehensive approach to differential abundance analysis enables researchers to rigorously evaluate the presence and significance of pathogens under various experimental conditions, further strengthening the package’s capability in delivering insightful pathogen profiles from RNA-seq data.
Methodologically, PathSeeker accuracy was confirmed using RNA-seq datasets of neonatal hydrocephalus and maternal placental infections for various viral, bacterial, and parasitic organism detection. It addresses the challenge of noise reduction for pathogen identification using RNA-seq. For in-depth confirmation of PathSeeker results, polymerase chain reaction (PCR), and rapid diagnostic blood (RDT) was utilized.
PathSeeker's commitment to transparency and accessibility will see its methodologies made available to the scientific and health communities via open-access R packages on GitHub and Bioconductor. This approach not only promotes collaborative research but also propels the field forward in the accurate interpretation of RNA-seq data in pathogen research in a user friendly manner.