bamSliceR: Estimation of Transcripts origin of Variants by a Bayesian Approach Using RNA-seuencing data.
Understanding the transcript architecture of genes is crucial for annotating variants determined by raw read counts from RNA-sequencing data. Overlooking or biasing the transcript-specific context may lead to the misinterpretation of the disease-associated effects of these variants. For example, in 2020, Schoch et al. Genet Med highlights a epilepsy patient harboring pathogenic mutation in transcript NM_001323289.1 in exon 17, which were previously treat as at intronic region pass exon 18 of transcript NM_003159. Furthermore, identifying genetic variants within specific transcripts is highly relevant for understanding biological consequences, such as how these alternative mRNAs or their products interact with disease-related perturbations.
Cloud-based repositories like NCI Genomic Data Commons (GDC) harmonized DNA/RNA sequencing data from large translational research consortia provide resources for us to estimate the Variant Allele Frenquency (VAF) of variants in different stats from genome to specific-transcripts being expressed.
We previously developed the bamSliceR package to perform coordinate- or range-based extraction of genomic aligned DNA and RNA reads from the NCI Genomic Data Commons (GDC), followed by downstream analysis of variant effects using the robust Bioconductor ecosystem. We also demonstrate that this all-in-one toolset can be applied on public available or local sequencing data to reduce time, space, and computational burden. Here we want to highlight the new utilities we implemented in the bamSliceR to facilitate transcript-aware variants annotation using transcriptome BAM that generated by aligning reads from RNA-seq to reference transcripts sequences (GENCODE v36). For each variants, raw read counts of reference and alternative allele are tallied for all compatible transcripts of the gene. Read names of the variant that support each transcripts are also collected to provide weighted evidence based on degree of reads mapped to multiple transcripts. We then use a bayesian model comparison approach to estimate the transcript-specific VAF.