Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 19;6(1):9.
doi: 10.1186/1748-7188-6-9.

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Affiliations

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Marius Nicolae et al. Algorithms Mol Biol. .

Abstract

Background: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.

Results: In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/.

Conclusions: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The algorithm for identifying isoforms compatible with reads.
Figure 2
Figure 2
The expectation-maximization algorithm used by IsoEM.
Figure 3
Figure 3
Distribution of compatibility component sizes (defined as the number of isoforms) for 10 million single reads of length 75 (a) and number of read classes for 1 to 30 million single reads or pairs of reads of length 75 (b).
Figure 4
Figure 4
The E-Step of IsoEM algorithm based on read classes.
Figure 5
Figure 5
Distribution of isoform lengths (a) and gene cluster sizes (b) in the UCSC dataset.
Figure 6
Figure 6
Error fraction at different thresholds for isoform (a) and gene (b) expression levels inferred from 30 M reads of length 25 simulated assuming geometric isoform expression.
Figure 7
Figure 7
Comparison of Cufflinks (a) and IsoEM (b) estimates to qPCR expression levels reported in [14].
Figure 8
Figure 8
Comparison of Cufflinks (a) and IsoEM (b) estimates to qPCR expression levels reported in [31].
Figure 9
Figure 9
IsoEM MPE (a) and r2 values (b) for 750 Mb of simulated data generated using single and paired-end reads of length varying between 10 and 100.
Figure 10
Figure 10
IsoEM r2 (a) and CPU time (b) for 1-60 million single/paired reads of length 75, with or without strand information.

References

    1. Wang E, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S, Schroth G, Burge C. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA. Alternative expression analysis by RNA sequencing. Nature Methods. 2010;7(10):843–847. doi: 10.1038/nmeth.1503. - DOI - PubMed
    1. Ponting CP, Belgard TG. Transcribed dark matter: meaning or myth? Human Molecular Genetics. 2010;19(R2):R162–R168. doi: 10.1093/hmg/ddq362. - DOI - PMC - PubMed
    1. Anton M, Gorostiaga D, Guruceaga E, Segura V, Carmona-Saez P, Pascual-Montano A, Pio R, Montuenga L, Rubio A. SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biology. 2008;9(2):R46. doi: 10.1186/gb-2008-9-2-r46. - DOI - PMC - PubMed
    1. She Y, Hubbell E, Wang H. Resolving deconvolution ambiguity in gene alternative splicing. BMC Bioinformatics. 2009;10:237. doi: 10.1186/1471-2105-10-237. - DOI - PMC - PubMed

LinkOut - more resources