Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct;18(10):1660-9.
doi: 10.1101/gr.077644.108. Epub 2008 Jul 24.

Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations

Affiliations

Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations

Gennifer E Merrihew et al. Genome Res. 2008 Oct.

Abstract

We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Chromsomal distribution of peptides identified by mass spectrometry in C. elegans. Shown here are the distributions of our mass spectral identifications by chromosome location. The chromosomes are binned into sections of ∼100 kb, and the length of the blue line represents the number of spectra mapping to genes in that bin. This figure shows that our peptides are sampled more frequently from genes in the center of the autosomes and more disperse on the arms of the autosomes and on the sex chromosome. Assuming that peptides are sampled more frequently for abundant proteins, these data support that proteins near the center of the autosome are, on average, expressed at a greater abundance than proteins located on the arms of the autosome.
Figure 2.
Figure 2.
Splice junction confirmation via mass spectrometry. The confirmation of the splice junction between exon 10 (in blue) and exon 11 (in green) for the C. elegans gene C27C12.7 encoding a dipeptidyl peptidase (DPF-2) is illustrated. (A) The unspliced DNA sequence of C27C12.7 between the end of exon 10 and the beginning of exon 11. (B) Exon 10 and exon 11 spliced together. (C) The spliced exons separated into codons. (D) The peptide sequence spanning the splice junction and the representative mass spectrum. The numbers in blue above the peptide sequence represent the C-terminal y-ions and the red numbers below the peptide sequence represent the N-terminal b-ions. (Blue) y-ions; (red) b-ions; (green) all other ions (a-ions, doubly charged ions, ions from the loss of water or ammonia, etc.).
Figure 3.
Figure 3.
Classification of proteins identified from existing or new coding sequences. From the total 6779 proteins identified, 6350 were identified based on the protein-coding genes from WormBase WS150, and 429 proteins were identified using either new GeneFinder predictions, the conserved intergenic data set, or both. From the 429 new proteins, 33 mapped to predicted pseudogenes in WS150. Of the 33 predicted pseudogenes, 18.2% have been confirmed by RT-PCR. We have identified 151 misannotated protein sequences, and 56.9% of these new coding sequences have RT-PCR confirmation. The last category represents 245 novel or unknown coding sequences of which 40.8% have RT-PCR confirmation.
Figure 4.
Figure 4.
Identification of a novel coding sequence by shotgun proteomics. Three unique peptides were identified in the genomic region 16,652,022–16,654,397 on the X chromosome. This genomic region represents a new ORF from the new GeneFinder prediction set. There are no gene models predicted in this region in WormBase WS150; however, several SAGE tags confirming this gene model have been added since WS150. A mass spectrum from the peptide SPASGSALLDLLSR is shown.
Figure 5.
Figure 5.
Correction of a misannotated coding sequence. The gene alh-3 (F36H1.6) contains six exons (pink) and encodes a dehydrogenase in C. elegans according to WormBase 150. We have identified two unique peptides (blue) between exons 2 and 3 that span the genomic region 11,022,575–11,025,702 on chromosome IV. Both peptides lie at least partially within an intron. This gene model has since been fixed in WS180.
Figure 6.
Figure 6.
Identification of a misannotated coding sequence located in an untranslated region (UTR). WormBase gene model T08A9.11 lies within genomic region of 7,327,554–7,330,510 on chromosome X. The two unique peptides (blue) lie within 3′ UTR (gray) region of the gene in WormBase 150. A mass spectrum from the peptide SSLTIPDNFVTEGEVPQK, one of the two peptides identified within the 3′ UTR, is shown.
Figure 7.
Figure 7.
Identification of a translated pseudogene. Two unique peptides (blue) span the conserved intergenic ORF prediction located at 13,386,747–13,387,043 on chromosome IV. In WormBase WS150 these peptides were present within a predicted pseudogene. In a later version of WormBase, this pseudogene has been corrected to a protein-coding gene. A mass spectrum of the peptide DMFAFENVGFTR, one of the two peptides confirming the translation of this pseudogene, is illustrated.
Figure 8.
Figure 8.
Peptides identified in the insulin/insulin-like growth factor 1 signaling pathway can be used as proteotypic peptides in future targeted analyses. Shown here are the major proteins involved in the insulin/insulin-like growth factor 1 signaling pathway along with peptides identified from the respective proteins.

References

    1. Anderson L., Hunter C.L., Hunter C.L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics. 2006;5:573–588. - PubMed
    1. Basrai M.A., Hieter P., Boeke J.D., Hieter P., Boeke J.D., Boeke J.D. Small open reading frames: Beautiful needles in the haystack. Genome Res. 1997;7:768–771. - PubMed
    1. Brunner E., Ahrens C.H., Mohanty S., Baetschmann H., Loevenich S., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Ahrens C.H., Mohanty S., Baetschmann H., Loevenich S., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Mohanty S., Baetschmann H., Loevenich S., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Baetschmann H., Loevenich S., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Loevenich S., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Potthast F., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Deutsch E.W., Panse C., de Lichtenberg U., Rinner O., Panse C., de Lichtenberg U., Rinner O., de Lichtenberg U., Rinner O., Rinner O., et al. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 2007;25:576–583. - PubMed
    1. The C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: A platform for investigating biology. Science. 1998;282:2012–2018. - PubMed
    1. Desiere F., Deutsch E.W., King N.L., Nesvizhskii A.I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., Deutsch E.W., King N.L., Nesvizhskii A.I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., King N.L., Nesvizhskii A.I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., Nesvizhskii A.I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., Eng J., Chen S., Eddes J., Loevenich S.N., Aebersold R., Chen S., Eddes J., Loevenich S.N., Aebersold R., Eddes J., Loevenich S.N., Aebersold R., Loevenich S.N., Aebersold R., Aebersold R. The PeptideAtlas project. Nucleic Acids Res. 2006;34:D655–D658. - PMC - PubMed

Publication types

LinkOut - more resources