Current algorithmic solutions for peptide-based proteomics data generation and identification

Michael R Hoopmann¹, Robert L Moritz

Affiliations

PMID: 23142544
PMCID: PMC3857305
DOI: 10.1016/j.copbio.2012.10.013

Review

Current algorithmic solutions for peptide-based proteomics data generation and identification

Michael R Hoopmann et al. Curr Opin Biotechnol. 2013 Feb.

. 2013 Feb;24(1):31-8.

doi: 10.1016/j.copbio.2012.10.013. Epub 2012 Nov 8.

Authors

Michael R Hoopmann¹, Robert L Moritz

Affiliation

¹ Institute for Systems Biology, Seattle, WA 98109, USA.

PMID: 23142544
PMCID: PMC3857305
DOI: 10.1016/j.copbio.2012.10.013

Abstract

Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.

PubMed Disclaimer

Figures

**Figure 1**
A robust computational workflow for shotgun proteomics. Spectra acquired from several complementary fragmentation techniques are simultaneously analyzed with multiple sequencing algorithms, utilizing cloud-based computing resources. Multi-level peptide validation algorithms assign probabilities to peptide sequences from the combined sequencing results. The most likely proteins in the sample are inferred from the confident peptide sequences.

**Figure 2**
Estimating false discovery rate (FDR) using target-decoy searches. (A) The score distribution of target PSMs (blue line) shows a right tail when compared to the decoy distribution (red line), representing correct PSMs (green line). (B) The distribution of scores for the target and decoy PSMs are used to estimate the FDR at any score threshold.

**Figure 3**
Protein inference from PSMs. Peptide sequences from PSMs are used to infer proteins in the sample. Peptide degeneracy, or when a PSM can be matched to two or more proteins, makes correct protein identification difficult. Some algorithms use probabilistic models, represented by large red circles versus small orange circles, to discriminate correct from incorrect protein identifications.

See this image and copyright information in PMC

References

1. Steen H, Mann M. The ABC's (and XYZ's) of peptide sequencing. Nat Rev Mol Cell Biol. 2004;5:699–711. - PubMed
1. Käll L, Vitek O. Computational mass spectrometry-based proteomics. PLoS Comput Biol. 2011;7:e1002277. - PMC - PubMed
1. Eng JK, Searle BC, Clauser KR, Tabb DL. A face in the crowd: recognizing peptides through database search. Mol Cell Proteomics. 2011;10 R111 009522. - PMC - PubMed
1. Mikesh LM, Ueberheide B, Chi A, Coon JJ, Syka JE, Shabanowitz J, Hunt DF. The utility of ETD mass spectrometry in proteomic analysis. Biochim Biophys Acta. 2006;1764:1811–1822. - PMC - PubMed
1. Frese CK, Altelaar AF, Hennrich ML, Nolting D, Zeller M, Griep-Raming J, Heck AJ, Mohammed S. Improved peptide identification by targeted fragmentation using CID HCD and ETD on an LTQ-Orbitrap Velos. J Proteome Res. 2011;10:2377–2388. - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Current algorithmic solutions for peptide-based proteomics data generation and identification

Affiliation

Current algorithmic solutions for peptide-based proteomics data generation and identification

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources