Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 27;9(1):1681.
doi: 10.1038/s41467-018-03402-w.

Bayesian nonparametric discovery of isoforms and individual specific quantification

Affiliations

Bayesian nonparametric discovery of isoforms and individual specific quantification

Derek Aguiar et al. Nat Commun. .

Abstract

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Alternative splicing mechanisms. A single gene may be transcribed into several distinct mRNA variants called isoforms through alternative splicing mechanisms. This figure shows six common types of splicing events (top to bottom): simple transcript; alternative transcription start site; alternative 5′ splice site; alternative 3′ splice site; skipped exon; and alternative polyadenylation
Fig. 2
Fig. 2
Isoform discovery precision and recall for simulated data. Precision (red) and recall (blue) of the results from biisq, ISP, CEM, Cufflinks (CUFF), and SLIDE (SLIDE_more and SLIDE_fewer) applied to the BEERS-simulated single-end RNA-seq data. The thick center bars denote the mean precision or recall and the fill denotes three times the standard error. Transparent fill denotes partial precision and recall with a matching threshold of 0.1. Across all methods, the best (partial) precision and recall values are annotated above their respective data points
Fig. 3
Fig. 3
Isoform quantification accuracy. Correlation between true RPKM and inferred RPKM for a BEERS-simulated data and b Iso-Seq-simulated data. Spearman correlation coefficients for results from biisq, CEM, Cufflinks, SLIDE_more, and SLIDE_fewer were a 0.942, 0.870, 0.918, 0.914, and 0.908, respectively, for BEERS-simulated data and b 0.814, −0.002, 0.835, 0.609, and 0.236, respectively, for simulated short-read data from Iso-Seq reads. A regression line represents the best linear fit for each method to the expression data
Fig. 4
Fig. 4
Comparison of methods on Iso-seq simulations. Precision (red) and recall (blue) of the results from biisq, CEM, Cufflinks (CUFF), SLIDE_more, and SLIDE_fewer applied to a the short-read data simulated from Iso-Seq reads; b simulated data split by read length; and c simulated data split by span. Transparent fill denotes partial precision and recall with a matching threshold of 0.2. The thick center bars denote the mean precision or recall, and the fill denotes twice the standard error. The best (partial) precision and recall values are annotated above their respective points
Fig. 5
Fig. 5
Results for isoform quantification in the GEUVADIS data. a The isoform quantification distribution where color denotes a unique isoform and each vertical bar is a single sample for genes PTPRN2 (top) and LGALS9B (bottom). b Simplex plots for gene LGALS9B factored by sex. Each point (red for female, black for male) represents a sample’s isoform composition for the two isoforms denoted on the bottom axis and the remaining isoforms at the top intersection point. c Matrix eQTL p value distribution for (left) cis-trQTLs and (right) cis-eQTLs. d The density of cis-trQTLs, LD pruned cis-trQTLs, and cis-eQTLs distances to the nearest canonical splice junctions in GENCODE. e LCN8 contained the most significant cis-trQTL (p ≤ 2.2 × 10−16). Linear and logistic regressions are shown in black and red. f Enrichment of cis-trQTLs variants in cis-regulatory annotations across cell types. Box plots show the distribution of a matched null set with Tukey whiskers (median ± 1.5 times interquartile range) and red points denoting significant enrichment (VSE test, Bonferroni-corrected p ≤ 0.01)

References

    1. Dutertre M, Vagner S, Auboeuf D. Alternative splicing and breast cancer. RNA Biol. 2010;7:403–411. doi: 10.4161/rna.7.4.12152. - DOI - PubMed
    1. Wang ET, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. The GTEx Consortium. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. - DOI - PMC - PubMed
    1. Weber GF. Molecular mechanisms of metastasis. Cancer Lett. 2008;270:181–190. doi: 10.1016/j.canlet.2008.04.030. - DOI - PubMed
    1. Srebrow A, Kornblihtt AR. The connection between splicing and cancer. J. Cell Sci. 2006;119:2635–2641. doi: 10.1242/jcs.03053. - DOI - PubMed

Publication types

LinkOut - more resources