BioC2024

Andrea Baran


Sessions

07-24
14:00
8min
miRglmm: modeling isomiR-level counts improves estimation of miRNA-level differential expression and uncovers variable differential expression between isomiRs
Andrea Baran

miRglmm: modeling isomiR-level counts improves estimation of miRNA-level differential expression and uncovers variable differential expression between isomiRs

Andrea M. Baran, Arun H. Patil, Marc K. Halushka, Matthew N. McCall

Abstract

Background: microRNAs (miRNAs) can be characterized by small RNA sequencing and sequencing reads are subsequently aligned to known miRNAs. Typically, read counts of sequences that align to the same miRNA are summed to produce miRNA-level read counts. This aggregation discards information about different miRNA transcript isoforms, called isomiRs, whose use might more accurately determine biological miRNA abundance. The aggregated miRNA counts are then used for subsequent differential expression (DE) analyses using tools developed for mRNA-Seq data analysis. There are important differences between miRNA-Seq data and mRNA-Seq data that may make key assumptions of these methods invalid when applied to miRNA-seq data, necessitating the need for a method designed specifically for miRNA-Seq data that can utilize the more granular isomiR-level data.

Methods: We establish miRglmm, a differential expression (DE) method that uses a negative binomial mixed effects model that models isomiR-level counts and accounts for dependencies due to reads coming from the same sample and/or from the same isomiR sequence. The isomiR random effect variances can be used to quantify variability in differential expression between isomiR sequences that align to the same miRNA; thereby, facilitating detection of miRNA with differential isomiR usage.

Results: Using synthetic benchmark data, we show that miRglmm provides the lowest mean squared error (MSE) among all DE tools while maintaining coverage at the nominal level. Using biological data to simulate miRNA with random isomiR variability in the group effect, we demonstrate that miRglmm provides markedly lower MSE and much better coverage than other DE tools, especially when significant isomiR variability is present. In real biological data, miRglmm provides fold change estimates that are similar to results from commonly used DE tools (intraclass correlation coefficients ≥ 0.8) and finds significant isomiR-level variability for most miRNA in our analyses.

Conclusions: In cases where significant isomiR variability exists, the loss of information due to aggregation of isomiR-level counts to miRNA-level counts is detrimental to the performance of commonly used DE tools. Our method, miRglmm, can account for this variability, and provides consistently high performance in estimating DE for miRNA, whether or not there is significant isomiR variability within miRNA.

Tomatis Auditorium