Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jul;18(7):1133-42.
doi: 10.1101/gr.074344.107. Epub 2008 Apr 21.

Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes

Affiliations
Comparative Study

Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes

Nitin Gupta et al. Genome Res. 2008 Jul.

Abstract

Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional "single proteome" approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of "one-hit-wonders" in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Expression of orthologous genes across the three species. (A) The number of orthologs shared between different species. There are 2590 orthologous genes present in all three species (referred to as “shared genes”). (B) The number of expressed shared genes (confirmed by two or more peptides) among the three species; 1052 shared genes are expressed in all three species, 708 shared genes are expressed in none.
Figure 2.
Figure 2.
Example of correlated one-hit-wonders in shared genes. Aligned amino acid sequences of the shared gene (annotated as hypothetical lipoprotein) are shown for each organism (SO_0515 in So, CN32_3345 in Sp, and Sfri_3590 in Sf). The identified peptides are shown in blue.
Figure 3.
Figure 3.
Commonly observed configurations of peptides in alternative frame. (A) Case A: Multiple peptides are observed in two different frames (one of them being the frame of the gene) in nonoverlapping regions. (B) Case B: Only one peptide is observed out of frame at one of the ends. (C) Case C: One peptide is seen out of frame with in-frame peptides on both sides.
Figure 4.
Figure 4.
Frameshift generated by sequencing error. In top panel, the nucleotide sequence for gene SO0590 is shown in red, the amino acid sequence of the protein is shown in green, and the amino acid sequences of the three translated frames are shown in black. Peptides identified by mass spectrometry are marked in blue (surrounded by boxes). The middle panel shows the ClustalW alignment with other Shewanella species in the region where frameshift occurs. The erroneous insertion of an extra “t” stands out in the alignment. The bottom panel indicates that both peptides fall in the original frame if the extra nucleotide is removed.
Figure 5.
Figure 5.
An example of a programmed frameshift. The nucleotide sequence for gene SO_0991 is shown in red, the amino acid sequence of the corresponding protein is shown in green, and the amino acid sequences of the three translated frames are shown in black. This gene has been correctly annotated in TIGR, and our predicted peptides in both the original frame and the alternative frame match the protein sequence.
Figure 6.
Figure 6.
A cleavage site located within a peptide ladder. The first line shows a section of the protein SO_0162 (residues 399–432) with the cleavage site between Y and L marked by a downward arrow. The subsequent lines show the identified peptides along with their spectral counts in the parentheses.

References

    1. Altschul S., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Antelmann H., Tjalsma H., Voigt B., Ohlmeier S., Bron S., van Dijl J., Hecker M. A proteomic view on genome-based signal peptide predictions. Genome Res. 2001;11:1484–1502. - PubMed
    1. Batzoglou S., Pachter L., Mesirov J., Berger B., Lander E. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 2000;10:950–958. - PMC - PubMed
    1. Ben-Bassat A., Bauer K., Chang S., Myambo K., Boosman A., Chang S. Processing of the initiation methionine from proteins: Properties of the Escherichia coli methionine aminopeptidase and its gene structure. J. Bacteriol. 1987;169:751–757. - PMC - PubMed
    1. Bendtsen J., Nielsen H., von Heijne G., Brunak S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004;340:783–795. - PubMed

Publication types

LinkOut - more resources