. 2009 Mar 3;106(9):3264-9.

doi: 10.1073/pnas.0812841106. Epub 2009 Feb 10.

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

Moran Yassour¹, Tommy Kaplan, Hunter B Fraser, Joshua Z Levin, Jenna Pfiffner, Xian Adiconis, Gary Schroth, Shujun Luo, Irina Khrebtukova, Andreas Gnirke, Chad Nusbaum, Dawn-Anne Thompson, Nir Friedman, Aviv Regev

Affiliations

PMID: 19208812
PMCID: PMC2638735
DOI: 10.1073/pnas.0812841106

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

Moran Yassour et al. Proc Natl Acad Sci U S A. 2009.

. 2009 Mar 3;106(9):3264-9.

doi: 10.1073/pnas.0812841106. Epub 2009 Feb 10.

Authors

Affiliation

¹ School of Computer Science and Engineering, The Hebrew University, Jerusalem, 91904, Israel.

PMID: 19208812
PMCID: PMC2638735
DOI: 10.1073/pnas.0812841106

Abstract

Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5' and 3' UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: G.S., S.L., and I.K. are from Illumina Corporation, which developed the sequencing technology we used, and thus have competing financial interests.

Figures

**Fig. 1.**
Unbiased sequencing of the yeast transcriptome. (A) Distribution of reads mapped to the PAP1 locus. Shown are SGD annotations (downloaded at November 2007) (8), and mapped reads (red, W strand; blue, C strand). Additional tracks plot the cumulative number of reads covering each base position (yellow, YPD; light blue, HS). Full data can be accessed at http://compbio.cs.huji.ac.il/RNASeq, and is visualized using the University of California, Santa Cruz, genome browser (22). (B) Distribution of reads matched to the genome. Of the 26,050,414 reads sequenced in YPD (*Left*), 13,424,957 (52%, blue) were uniquely mapped to a single genomic locus, 6,144,595 (23%, green) were mapped to several locations, and 6,480,862 (25%, yellow) could not have been aligned, and were later used to detect splice junctions. Similar numbers were found after a HS (*Right*).

**Fig. 2.**
Ab initio assembly of a transcript catalog. (A) Outline of steps in the catalog construction pipeline. (B) Segmentation of a contiguously transcribed region into 2 regions of distinct expression levels corresponding to the genes YBR287W and APM3. When using YPD reads alone, both genes exhibit similar coverage and thus cannot be segmented. However, in HS, they are differentially expressed, and hence by combining observations from both conditions the automatic segmentation procedure (see *Materials and Methods*) correctly separates them to 2 units. Tracks from top to bottom: SGD annotations (blue), our catalog (green), read coverage at YPD (yellow), and read coverage at HS (blue). (C) Detection of splice junctions. Full and gapped reads mapped to the RIM1 genomic locus. Tracks are as in B, together with gapped reads (connected segments), our putative splice junctions (in red and blue), including the junction orientations as estimated by donor and acceptor sequence motifs (arrows). As shown, our procedure identifies the exact coordinates and orientation of the known splice site.

**Fig. 3.**
Validation of the transcript catalog. (A) Coverage of the top 86% expressed genes by our predicted transcribed units, based on different patterns of coverage. (B) Relationship between found transcribed units and annotated transcribed features from SGD. In both A and B, white boxes denote genes, and purple boxes denote transcribed units. (C) Comparison of our putative splice junctions (blue) to known ones (green). (D) The 51 known introns missed by our predictions are partitioned into 8 categories. (E) Validation of splicing read-through in the gene FES1. Tracks are as in Fig. 2C, including the evolutionary conservation of each position across 7 yeast species (15).

See this image and copyright information in PMC

References

1. Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
1. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. - PubMed
1. Nagalakshmi U, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. - PMC - PubMed
1. Wilhelm BT, et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008;453:1239–1243. - PubMed
1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

DP1 OD003958/OD/NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

Affiliation

Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases