Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar;14(3):259-262.
doi: 10.1038/nmeth.4153. Epub 2017 Jan 30.

Building ProteomeTools based on a complete synthetic human proteome

Affiliations

Building ProteomeTools based on a complete synthetic human proteome

Daniel P Zolg et al. Nat Methods. 2017 Mar.

Abstract

We describe ProteomeTools, a project building molecular and digital tools from the human proteome to facilitate biomedical research. Here we report the generation and multimodal liquid chromatography-tandem mass spectrometry analysis of >330,000 synthetic tryptic peptides representing essentially all canonical human gene products, and we exemplify the utility of these data in several applications. The resource (available at http://www.proteometools.org) will be extended to >1 million peptides, and all data will be shared with the community via ProteomicsDB and ProteomeXchange.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests Statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Overview of the ProteomeTools project
(a) Planned segmentation of the 1.4 million peptides that will be selected from the human proteome and synthesized over the course of the project. Here, we report on the analysis of 330,000 individually synthesized tryptic peptides. (b) Estimation of synthesis success using peptide precursor intensity information for the peptide SVSLLEER and its by-products. Here, 82% of the total MS signal can be attributed to the full length product. (c) Boxplots for the number of tandem mass spectra identifying a given peptide with very high confidence (Andromeda score >100; total of 11.3 million PSMs in 11 types of tandem MS); the number of such peptides (total of 211,895) covering a given protein/gene (total of 19,735) and the average precursor intensity fraction (PIF; see main text) of these peptides. (d) Distribution of peptide and protein identifications as a function of the Andromeda score. All data is available in ProteomicsDB and proteomeXchange.
Figure 2
Figure 2. Data analysis and application
(a) Protein identification: target/decoy search results for the peptide LAAQGLGMQAACTLTR of Aquaporin 12B (AQP12B). There are only two spectra in ProteomicsDB with identification Q-Scores distinct from the decoy distribution (left panel). The inset shows the Q-Score distribution of all genes/proteins in ProteomicsDB, placing AQP12B well way from the decoy proteins. The right panel shows the best CID mass spectrum for AQP12B in ProteomicsDB (top) compared to the corresponding CID spectrum of the synthesized reference peptide confirming this identification. (b) Transferability between MS instruments: comparison of a spectrum acquired from a complex digest by beam-type CID on a QTOF instrument for the peptide VVSEDFLQDVSASTK compared to the corresponding spectrum of the synthesized reference peptide acquired by beam-type CID on an Orbitrap instrument (left panel). Fragment ion intensities show very high correlation (Pearson correlation of 0.9). Extending this analysis to ~9,000 peptides confirmed the high correlation of these two types of tandem mass spectra (right panel). (c) Development of a predictor for tandem mass spectra. HCD data were recorded at six collision energies. The left panel shows the median relative fragment ion intensities of 12 y-fragment ions for the peptide YYLIQLLEDDAQR. Using these characteristics for all spectra of all peptides, a predictive model was trained for each normalized collision energy. The comparison of measured and predicted spectra for YYLIQLLEDDAQR (middle panel) show very good agreement. The histogram on the right shows that the predictor (tested on 529 peptide sequences and 3,248 spectra) of pool 66 of the proteotypic peptide set, is generally able to predict the relative y-ion intensity for a given peptide with good quality (see Supplementary Notes for details).

Comment in

References

    1. Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR., 3rd Protein analysis by shotgun/bottom-up proteomics. Chemical reviews. 2013;113:2343–2394. - PMC - PubMed
    1. Ahrens CH, Brunner E, Qeli E, Basler K, Aebersold R. Generating and navigating proteome maps using mass spectrometry. Nature reviews Molecular cell biology. 2010;11:789–801. - PubMed
    1. Kusebauch U, et al. Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell. 2016;166:766–778. - PMC - PubMed
    1. Picotti P, et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature. 2013;494:266–270. - PMC - PubMed
    1. Mallick P, et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nature biotechnology. 2007;25:125–131. - PubMed

Publication types