Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 4;14(9):3461-73.
doi: 10.1021/acs.jproteome.5b00500. Epub 2015 Jul 24.

State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet

Affiliations

State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet

Eric W Deutsch et al. J Proteome Res. .

Abstract

The Human PeptideAtlas is a compendium of the highest quality peptide identifications from over 1000 shotgun mass spectrometry proteomics experiments collected from many different laboratories, all reanalyzed through a uniform processing pipeline. The latest 2015-03 build contains substantially more input data than past releases, is mapped to a recent version of our merged reference proteome, and uses improved informatics processing and the development of the AtlasProphet to provide the highest quality results. Within the set of ∼20,000 neXtProt primary entries, 14,070 (70%) are confidently detected in the latest build, 5% are ambiguous, 9% are redundant, leaving the total percentage of proteins for which there are no mapping detections at just 16% (3166), all derived from over 133 million peptide-spectrum matches identifying more than 1 million distinct peptides using AtlasProphet to characterize and classify the protein matches. Improved handling for detection and presentation of single amino-acid variants (SAAVs) reveals the detection of 5326 uniquely mapping SAAVs across 2794 proteins. With such a large amount of data, the control of false positives is a challenge. We present the methodology and results for maintaining rigorous quality along with a discussion of the implications of the remaining sources of errors in the build.

Keywords: Human Proteome Project; PeptideAtlas; observed proteome; repositories; shotgun proteomics; tandem mass spectrometry.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of the increase in content of the Human PeptideAtlas with the addition of five very large tranches of data (plus various smaller datasets) from 2013 to 2015. The left panel shows that the number of distinct peptide sequences doubled from less than 0.5 million to ~1 million. The right panel shows that the number of distinct canonical proteins (neXtProt 20k baseline) only increased by 8%. The 13,026 has been corrected from the previously published value of 13,377 to compensate for the use of neXtProt as the reference for this chart instead of the more comprehensive reference used in 2013. The 2014 numbers are different than reported at the HUPO 2014 Congress on account of applying the AtlasProphet algorithm to obtain more accurate counts for the 2014-08 build.
Figure 2
Figure 2
Summary of the mapping of the Human 2015-03 PeptideAtlas peptides to the neXtProt core 20,061 proteins. The proteins are apportioned in 10 categories with 4 groups. The relative sizes of the groups are depicted in the pie chart.
Figure 3
Figure 3
Example of SAAV information in PeptideAtlas protein display for MMS19 (Q96T76). At the top is a partial listing from Swiss-Prot of PeptideAtlas-detected SAAVs and their detected frequencies. At the bottom is a sequence alignment view showing the reference sequence as well as three of the aligned, detected SAAVs. On the left is the region 50-100, showing two SAAVs, one (at position 68) observed exclusively (i.e. no peptides match the unmodified version), and one (at position 98) that has never been observed. On the right is the region 500 – 570, showing the alignment of two SAAV peptides where both the reference form and the changed forms have been observed, although the changed forms only infrequently. By looking in Kaviar at the exclusively seen SAAV corresponding to dbSNP entry rs2275586, one finds that, at position chr10:97481001, the overall frequency of the reference sequence is only ~4%, while ~96% of genotypes analyzed for inclusion in the Kaviar database version 2.0 have the A → G forms (Supplementary Figure 2).
Figure 4
Figure 4
Comparison of fragmentation spectra for 2+ ions of peptide YFNPCYATAR in protein Q9UJA2. Top: high resolution HCD spectrum using an Orbitrap Velos from a fetal testis sample in PeptideAtlas. Bottom: high resolution CID spectrum from a synthetic peptide using a 6530 QTOF. Despite being from different instruments and sources, the spectra are very similar, lending great confidence that the peptide has been correctly identified. Most of the unlabeled low-mass ions are immonium ions and other annotatable fragments.

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Deutsch EW, Lam H, Aebersold R. Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics. 2008;33(1):18–25. - PubMed
    1. Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 2004;6(1):R9. - PMC - PubMed
    1. Deutsch EW, Lam H, Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 2008;9(5):429–34. - PMC - PubMed
    1. Farrah T, Deutsch EW, Omenn GS, Campbell DS, Sun Z, Bletz JA, Mallick P, Katz JE, Malmstrom J, Ossola R, Watts JD, Lin B, Zhang H, Moritz RL, Aebersold R. A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas. Mol Cell Proteomics. 2011;10(9) M110 006353. - PMC - PubMed

Publication types