Fast and accurate approximate inference of transcript expression from RNA-seq data
- PMID: 26315907
- PMCID: PMC4673974
- DOI: 10.1093/bioinformatics/btv483
Fast and accurate approximate inference of transcript expression from RNA-seq data
Abstract
Motivation: Assigning RNA-seq reads to their transcript of origin is a fundamental task in transcript expression estimation. Where ambiguities in assignments exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem can be solved through probabilistic inference. Bayesian methods have been shown to provide accurate transcript abundance estimates compared with competing methods. However, exact Bayesian inference is intractable and approximate methods such as Markov chain Monte Carlo and Variational Bayes (VB) are typically used. While providing a high degree of accuracy and modelling flexibility, standard implementations can be prohibitively slow for large datasets and complex transcriptome annotations.
Results: We propose a novel approximate inference scheme based on VB and apply it to an existing model of transcript expression inference from RNA-seq data. Recent advances in VB algorithmics are used to improve the convergence of the algorithm beyond the standard Variational Bayes Expectation Maximization algorithm. We apply our algorithm to simulated and biological datasets, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation. We carry out a comparative study against seven popular alternative methods and demonstrate that our new algorithm provides excellent accuracy and inter-replicate consistency while remaining competitive in computation time.
Availability and implementation: The methods were implemented in R and C++, and are available as part of the BitSeq project at github.com/BitSeq. The method is also available through the BitSeq Bioconductor package. The source code to reproduce all simulation results can be accessed via github.com/BitSeq/BitSeqVB_benchmarking.
© The Author 2015. Published by Oxford University Press.
Figures







Similar articles
-
TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.Bioinformatics. 2013 Sep 15;29(18):2292-9. doi: 10.1093/bioinformatics/btt381. Epub 2013 Jul 2. Bioinformatics. 2013. PMID: 23821651
-
Improved variational Bayes inference for transcript expression estimation.Stat Appl Genet Mol Biol. 2014 Apr 1;13(2):203-16. doi: 10.1515/sagmb-2013-0054. Stat Appl Genet Mol Biol. 2014. PMID: 24413218
-
Identifying differentially expressed transcripts from RNA-seq data with biological variation.Bioinformatics. 2012 Jul 1;28(13):1721-8. doi: 10.1093/bioinformatics/bts260. Epub 2012 May 3. Bioinformatics. 2012. PMID: 22563066 Free PMC article.
-
A comparison of computational algorithms for the Bayesian analysis of clinical trials.Clin Trials. 2024 Dec;21(6):689-700. doi: 10.1177/17407745241247334. Epub 2024 May 16. Clin Trials. 2024. PMID: 38752434 Free PMC article.
-
TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads.BMC Genomics. 2014;15 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2164-15-S10-S5. Epub 2014 Dec 12. BMC Genomics. 2014. PMID: 25560536 Free PMC article.
Cited by
-
A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification.Bioinformatics. 2020 Jul 1;36(Suppl_1):i292-i299. doi: 10.1093/bioinformatics/btaa450. Bioinformatics. 2020. PMID: 32657394 Free PMC article.
-
Perplexity: evaluating transcript abundance estimation in the absence of ground truth.Algorithms Mol Biol. 2022 Mar 25;17(1):6. doi: 10.1186/s13015-022-00214-y. Algorithms Mol Biol. 2022. PMID: 35331283 Free PMC article.
-
Exact transcript quantification over splice graphs.Algorithms Mol Biol. 2021 May 10;16(1):5. doi: 10.1186/s13015-021-00184-7. Algorithms Mol Biol. 2021. PMID: 33971903 Free PMC article.
-
Polee: RNA-Seq analysis using approximate likelihood.NAR Genom Bioinform. 2021 May 25;3(2):lqab046. doi: 10.1093/nargab/lqab046. eCollection 2021 Jun. NAR Genom Bioinform. 2021. PMID: 34056596 Free PMC article.
-
Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis.Methods Protoc. 2021 Sep 27;4(4):68. doi: 10.3390/mps4040068. Methods Protoc. 2021. PMID: 34698224 Free PMC article.
References
-
- Amari S. (1998) Natural gradient works efficiently in learning. Neural Comput., 10, 251–276.
-
- Bishop C. (2006) Pattern Recognition and Machine Learning. Springer, New York, NY.
-
- Bray N., et al. (2015) Near-optimal RNA-Seq quantification. arXiv (q-bio.QM), arXiv:1505.02710v2.
-
- Gelman A., et al. (2003) Bayesian Data Analysis. 2nd edn Chapman & Hall, CRC Press LLC, Florida, US, Texts in Statistical Science.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources