Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;21(7):1193-200.
doi: 10.1101/gr.113779.110. Epub 2011 May 2.

De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics

Affiliations

De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics

Catherine Adamidi et al. Genome Res. 2011 Jul.

Abstract

Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental design.
Figure 2.
Figure 2.
Distribution of transcripts with different numbers of peptide matches. Distribution of distinct peptide matches to BIMSB ORFs and MAKER-predicted proteins. For each ORF generated from the BIMSB transcript sequences (see main text), the number of distinct peptide mappings is counted (x-axis) and the frequency of each count is plotted (y-axis). The same distribution is shown for the MAKER predicted proteins.
Figure 3.
Figure 3.
Enriched expression of novel BISMB transcripts in planarian stem cells. Whole-mount in situ hybridization was performed on normal and irradiated asexual planarians using either smedwi-1, smedmlgA, or novel BIMBSB transcripts (human homologs) labeled: A (Misu or NSUN2), B (MOV10L1), C (LRRK2), D (HES5), E (RNMTL1), and F (CWF19L2). See Supplemental Table 7 for details.
Figure 4.
Figure 4.
Estimated expression level (represented by RPKM) for transcripts predicted by both MAKER and BIMSB annotation and for transcripts predicted only by MAKER or BIMSB annotation. For the transcripts covered by both annotations, only the expression level, estimated based on BIMSB annotation, is depicted (the expression level estimated based on the MAKER annotation is highly correlated with a correlation coefficient of 0.938).

Similar articles

Cited by

References

    1. Agata K 2003. Regeneration and gene regulation in planarians. Curr Opin Genet Dev 13: 492–496 - PubMed
    1. Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22: e9–e15 - PubMed
    1. Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M 2008. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18: 188–196 - PMC - PubMed
    1. Cox J, Mann M 2008. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26: 1367–1372 - PubMed
    1. Friedländer MR, Adamidi C, Han T, Lebedeva S, Isenbarger TA, Hirst M, Marra M, Nusbaum C, Lee WL, Jenkin JC, et al. 2009. High-resolution profiling and discovery of planarian small RNAs. PNAS 106: 11546–11551 - PMC - PubMed

Publication types