Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;10(9):M110.006353.
doi: 10.1074/mcp.M110.006353. Epub 2011 Jun 1.

A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas

Affiliations

A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas

Terry Farrah et al. Mol Cell Proteomics. 2011 Sep.

Abstract

Human blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments. Applying advanced computational methods developed for the analysis and integration of very large and diverse data sets generated by tandem MS measurements of tryptic peptides, we have now compiled a high-confidence human plasma proteome reference set with well over twice the identified proteins of previous high-confidence sets. It includes a hierarchy of protein identifications at different levels of redundancy following a clearly defined scheme, which we propose as a standard that can be applied to any proteomics data set to facilitate cross-proteome analyses. Further, to aid in development of blood-based diagnostics using techniques such as selected reaction monitoring, we provide a rough estimate of protein concentrations using spectral counting. We identified 20,433 distinct peptides, from which we inferred a highly nonredundant set of 1929 protein sequences at a false discovery rate of 1%. We have made this resource available via PeptideAtlas, a large, multiorganism, publicly accessible compendium of peptides identified in tandem MS experiments conducted by laboratories around the world.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Left: Search, analysis, and validation steps for each LC-MS/MS experiment. Spectra were searched against a spectral library or sequence database. The resulting PSMs were then processed using the TPP, including a new component, iProphet, to improve discrimination (see text for details). Right: The PeptideAtlas build process. ProteinProphet combines PSMs passing the FDR threshold for all experiments to create lists of distinct peptides, protein identifications, and protein groups. These data, along with supporting information such as consensus spectra, genome mappings, and proteotypic peptides, comprise a PeptideAtlas build.
Fig. 2.
Fig. 2.
A, Six shaded bars (two of which overlap) represent sets of protein identifications at various levels of redundancy under the Cedar scheme. Tallies are for the Human Plasma PeptideAtlas. Beginning at bottom: ●Exhaustive set: contains any protein sequence in the atlas' combined protein sequence database (Swiss-Prot 2010–04 + IPI v3.71 + Ensembl v57.37) that includes at least one identified peptide. ●Sequence-unique set: exhaustive set with exact duplicates removed. ●Peptide-set-unique set: a subset of the sequence-unique set within which no two protein sequences include the exact same set of identified peptides. ●Not subsumed set: peptide-set-unique set with subsumed protein sequences removed (those for which the identified peptides form a proper subset of the identified peptides for another protein sequence). ●Canonical set: a subset of the not subsumed set within which no protein sequence includes more than 80% of the peptides of any other member of the set. Protein sequences that are not subsumed, but not canonical are called possibly distinguished, because each has a peptide set that is close, but not identical, to that of a canonical protein sequence. ●Covering set: a minimal set of protein sequences that can explain all of the identified peptides. B, Peptide-centric illustration of six protein sequences in a hypothetical ProteinProphet protein group, in order of descending ProteinProphet probability. Heavy lines represent protein chains (with invented identifiers); lighter lines represent observed peptides. Vertically aligned peptides are identical in sequence, and one instance of each is labeled with the letter of the highest probability protein to which it maps. A' is indistinguishable from A because it contains exactly the same set of observed peptides; both are equally likely to exist in the sample(s), but A is labeled canonical because its Swiss-Prot protein identifier is preferred. E is subsumed by A because its observed peptides form a subset of A's peptides; it is also subsumed by A', C, and D. Protein sequences B, C, and D are labeled possibly distinguished because the peptide set for each is slightly different from that of A. The three protein sequences with superscript C comprise the smallest subset of sequences sufficient to explain all the observed peptides in the group, and thus belong to the covering set.
Fig. 3.
Fig. 3.
Plasma protein concentrations determined using immunoassay and antibody microarray analysis (40) versus normalized spectral counts from the Human Plasma Non-glyco PeptideAtlas, plotted on a log scale. Each small square represents a protein found in both sources. Hollow squares represent proteins that were excluded when drawing the trend line (either depleted (albumin) or fewer than four spectrum counts). The line segments above and below the trend line are fit to the standard deviation of the y axis values computed at intervals of 0.1 (log scale). The arrows on the left represent proteins with reported concentrations in (40) but no spectrum counts. The histogram at the right depicts an estimate of the completeness of the Human Plasma Non-glyco PeptideAtlas as a function of concentration, calculated as the number of points divided by the total number of points and arrows within each decade. See supplemental Fig. S2, for N-Glyco atlas.
Fig. 4.
Fig. 4.
Proteins identified by each experiment. Each bar represents one of the 91 experiments, ordered as in supplemental Table S4. Height of dark bar = canonical protein sequences identified per experiment; total height (dark + light) = cumulative tally; width of bar = PSM count. See supplemental Fig. S5, for a similar graph of distinct peptides.

References

    1. Putnam F. W. ed. (1975–1989) The Plasma Proteins, 2nd Ed., Academic Press, New York
    1. Anderson N. L., Anderson N. G. (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell Proteomics 1, 845–867 - PubMed
    1. Kersey P. J., Duarte J., Williams A., Karavidopoulou Y., Birney E., Apweiler R. (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 - PubMed
    1. Omenn G. S., States D. J., Adamski M., Blackwell T. W., Menon R., Hermjakob H., Apweiler R., Haab B. B., Simpson R. J., Eddes J. S., Kapp E. A., Moritz R. L., Chan D. W., Rai A. J., Admon A., Aebersold R., Eng J., Hancock W. S., Hefta S. A., Meyer H., Paik Y. K., Yoo J. S., Ping P., Pounds J., Adkins J., Qian X., Wang R., Wasinger V., Wu C. Y., Zhao X., Zeng R., Archakov A., Tsugita A., Beer I., Pandey A., Pisano M., Andrews P., Tammen H., Speicher D. W., Hanash S. M. (2005) Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core data set of 3020 proteins and a publicly-available database. Proteomics 5, 3226–3245 - PubMed
    1. Omenn G. Editor (2006) Exploring the Human Plasma Proteome, Wiley-VCH, New York, NY

Publication types

LinkOut - more resources