Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Feb;24(1):31-8.
doi: 10.1016/j.copbio.2012.10.013. Epub 2012 Nov 8.

Current algorithmic solutions for peptide-based proteomics data generation and identification

Affiliations
Review

Current algorithmic solutions for peptide-based proteomics data generation and identification

Michael R Hoopmann et al. Curr Opin Biotechnol. 2013 Feb.

Abstract

Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A robust computational workflow for shotgun proteomics. Spectra acquired from several complementary fragmentation techniques are simultaneously analyzed with multiple sequencing algorithms, utilizing cloud-based computing resources. Multi-level peptide validation algorithms assign probabilities to peptide sequences from the combined sequencing results. The most likely proteins in the sample are inferred from the confident peptide sequences.
Figure 2
Figure 2
Estimating false discovery rate (FDR) using target-decoy searches. (A) The score distribution of target PSMs (blue line) shows a right tail when compared to the decoy distribution (red line), representing correct PSMs (green line). (B) The distribution of scores for the target and decoy PSMs are used to estimate the FDR at any score threshold.
Figure 3
Figure 3
Protein inference from PSMs. Peptide sequences from PSMs are used to infer proteins in the sample. Peptide degeneracy, or when a PSM can be matched to two or more proteins, makes correct protein identification difficult. Some algorithms use probabilistic models, represented by large red circles versus small orange circles, to discriminate correct from incorrect protein identifications.

References

    1. Steen H, Mann M. The ABC's (and XYZ's) of peptide sequencing. Nat Rev Mol Cell Biol. 2004;5:699–711. - PubMed
    1. Käll L, Vitek O. Computational mass spectrometry-based proteomics. PLoS Comput Biol. 2011;7:e1002277. - PMC - PubMed
    1. Eng JK, Searle BC, Clauser KR, Tabb DL. A face in the crowd: recognizing peptides through database search. Mol Cell Proteomics. 2011;10 R111 009522. - PMC - PubMed
    1. Mikesh LM, Ueberheide B, Chi A, Coon JJ, Syka JE, Shabanowitz J, Hunt DF. The utility of ETD mass spectrometry in proteomic analysis. Biochim Biophys Acta. 2006;1764:1811–1822. - PMC - PubMed
    1. Frese CK, Altelaar AF, Hennrich ML, Nolting D, Zeller M, Griep-Raming J, Heck AJ, Mohammed S. Improved peptide identification by targeted fragmentation using CID HCD and ETD on an LTQ-Orbitrap Velos. J Proteome Res. 2011;10:2377–2388. - PubMed

Publication types

LinkOut - more resources