BioC2024

microgenomeR: an R workflow for integrating genomic metadata and bacterial phenotypes
07-24, 14:55–15:03 (US/Eastern), Tomatis Auditorium

Background: Trait-based approaches in the fields of ecology and quantitative biology have been gaining traction. Investigating the bacterial trait space at a macro scale is crucial in understanding the niches they occupy. Specifically, we seek to understand the traits that delineate pathogenic, or more broadly host-associated bacteria. To tackle this problem, access to robust bacterial trait data is fundamental. Major issues with current data repositories, (BacDive, and bugPhyzz), include; 1) accessibility, setting a barrier to entry for biologists/ecologists lacking programming experience, and 2) the dispersal of different traits across multiple siloed repositories, requiring advanced data wrangling. We introduce the microgenomeR R package to tackle these issues by 1) readily presenting diverse trait data through R data files, 2) minimizing necessary data wrangling for the users with custom cleanup and summarization functions, and 3) enhancing trait data visualization with an interactive dashboard.

Approach: MicroGenomeR is an R package (with proposed installation via GitHub) encapsulating bacterial species metadata ranging from quantitative genomic traits to phenotypic traits. The microgenomeR package builds upon initial data integration and harmonization workflows (by Madin [2020], bugphyzz), which incorporate ~30+ different bacterial datasets and merge them into species and strain levels. The workflow was repurposed first by updating the constituent bacterial datasets. Secondly, for each species, the missing trait values were imputed using bugPhyzz, an R data package with a collection of regularly updated bacterial physiology traits. Where relevant, pathogenicity and host information was further added using a list of manually curated and text-mined bacterial host and pathogen data. The package exports the data as a R-data file for ease of access. For advanced users, the package allows updating of trait values using bugPhyzz. Additionally, the package is accompanied by a R Shiny dashboard for quick exploratory data analysis.

Results: The microgenomeR package contains 22 traits for around 15K+ bacterial species. The traits span 6 groups ranging from abiotic environmental traits to morphological traits. Further, ~2K species were designated as pathogens. For host information, ~4k species are assigned to 1356 unique host species spanning 12 major groups ranging from mammals, to plants, to invertebrates.

I am a Informatic Research Professional, in the University of Colorado Anschutz Medical Campus. I am under the mentorship of Dr. Janani Ravi, and Dr. Nina Wale. I graduated with a Bachelors in Biosystem Engineering, and a Masters in Computational Mathematics Science and Engineering, both from Michigan State University. My current research is on understand microbial phenotypic traits, using interpretable machine learning algorithms.