Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun;20(6):837-46.
doi: 10.1101/gr.103119.109. Epub 2010 Mar 17.

Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models

Affiliations

Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models

Nadine Borchert et al. Genome Res. 2010 Jun.

Abstract

Pristionchus pacificus is a nematode model organism whose genome has recently been sequenced. To refine the genome annotation we performed transcriptome and proteome analysis and gathered comprehensive experimental information on gene expression. Transcriptome analysis on a 454 Life Sciences (Roche) FLX platform generated >700,000 expressed sequence tags (ESTs) from two normalized EST libraries, whereas proteome analysis on an LTQ-Orbitrap mass spectrometer detected >27,000 nonredundant peptide sequences from more than 4000 proteins at sub-parts-per-million (ppm) mass accuracy and a false discovery rate of <1%. Retraining of the SNAP gene prediction algorithm using the gene expression data led to a decrease in the number of previously predicted protein-coding genes from 29,000 to 24,000 and refinement of numerous gene models. The P. pacificus proteome contains a high proportion of small proteins with no known homologs in other species ("pioneer" proteins). Some of these proteins appear to be products of highly homologous genes, pointing to their common origin. We show that >50% of all pioneer genes are transcribed under standard culture conditions and that pioneer proteins significantly contribute to a unimodal distribution of predicted protein sizes in P. pacificus, which has an unusually low median size of 240 amino acids (26.8 kDa). In contrast, the predicted proteome of Caenorhabditis elegans follows a distinct bimodal protein size distribution, with significant functional differences between small and large protein populations. Combined, these results provide the first catalog of the expressed genome of P. pacificus, refinement of its genome annotation, and the first comparison of related nematode models at the proteome level.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Phylogeny of Pristionchus pacificus and proteogenomics workflow employed in this study. (A) Phylogenetic relationship of nematodes with sequenced genomes. The genome sizes (megabases) are written in brackets. The clades are depicted in the tree. (B) P. pacificus gene expression was assessed at the levels of transcription and translation and in three different developmental stages: dauer, J2, and “mixed stage” (containing all developmental stages, including eggs). In proteomics approach, several workflows for protein extraction and separation were used. ESTs detected with 454 pyrosequencing and peptide sequences detected with LTQ-Orbitrap mass spectrometry were used for genome reannotation.
Figure 2.
Figure 2.
Overview of the proteomics results. (A) Application of complementary biochemical workflows for protein extraction and peptide separation led to enhanced proteome coverage. (sol) Soluble fraction; (pel) pellet. (B) Peptides were detected with a mean absolute mass deviation of 0.345 ppm. (C) Peptide sequences that mapped to the genome translation (“genomic peptides”) had a median size of 12 amino acids. (D) Distribution of posterior error probabilities (PEP) was markedly different in the genomic and reversed peptide sequences.
Figure 3.
Figure 3.
Genome reannotation resulted in new gene predictions and new gene models. (A) Example of a new gene model. New gene model “Contig126-snap.64” contains the old model “Contig126-snap.71”. (B) Example of a new gene prediction. The gene model “Contig125-snap.27” appeared only after retraining of the SNAP prediction algorithm with gene expression data.
Figure 4.
Figure 4.
Features of the P. pacificus predicted proteome. (A) Protein size distribution shows that the pioneer proteins are mainly responsible for the unusually low median protein size in P. pacificus. (B) BLAST results of the pioneer proteins against themselves show presence of highly homologous proteins that may have a common origin.
Figure 5.
Figure 5.
Comparison and functional analysis of protein size distributions in nematode models. (A) Predicted protein sizes in P. pacificus and B. malayi have a unimodal distribution, whereas C. elegans and C. briggsae have distinct bimodal distributions. (B) Gene Ontology enrichment analysis for short and long proteins in C. elegans shows distinct functional differences between the two classes of proteins.

Similar articles

Cited by

References

    1. Alexa A, Rahnenfuhrer J, Lengauer T 2006. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22: 1600–1607 - PubMed
    1. Ansong C, Purvine SO, Adkins JN, Lipton MS, Smith RD 2008. Proteogenomics: Needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomics Proteomics 7: 50–62 - PubMed
    1. Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S, Zimmermann P, Grossniklaus U, Gruissem W, Baginsky S 2008. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320: 938–941 - PubMed
    1. Benjamini Y, Hochberg Y 1995. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57: 289–300
    1. The C. elegans Sequencing Consortium 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282: 2012–2018 - PubMed

Publication types