Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 16:1:140031.
doi: 10.1038/sdata.2014.31. eCollection 2014.

A repository of assays to quantify 10,000 human proteins by SWATH-MS

Affiliations

A repository of assays to quantify 10,000 human proteins by SWATH-MS

George Rosenberger et al. Sci Data. .

Abstract

Mass spectrometry is the method of choice for deep and reliable exploration of the (human) proteome. Targeted mass spectrometry reliably detects and quantifies pre-determined sets of proteins in a complex biological matrix and is used in studies that rely on the quantitatively accurate and reproducible measurement of proteins across multiple samples. It requires the one-time, a priori generation of a specific measurement assay for each targeted protein. SWATH-MS is a mass spectrometric method that combines data-independent acquisition (DIA) and targeted data analysis and vastly extends the throughput of proteins that can be targeted in a sample compared to selected reaction monitoring (SRM). Here we present a compendium of highly specific assays covering more than 10,000 human proteins and enabling their targeted analysis in SWATH-MS datasets acquired from research or clinical specimens. This resource supports the confident detection and quantification of 50.9% of all human proteins annotated by UniProtKB/Swiss-Prot and is therefore expected to find wide application in basic and clinical research. Data are available via ProteomeXchange (PXD000953-954) and SWATHAtlas (SAL00016-35).

PubMed Disclaimer

Conflict of interest statement

S.T. is employee of AB SCIEX, which operates in the field covered by the article. The research group of R.A. is supported in part by AB SCIEX by providing access to prototype instrumentation. R.A. holds shares of Biognosys AG, which operates in the field covered by the article. The remaining authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Data acquisition and data analysis workflows employed for the generation of assay libraries. (a) Data acquisition: Sampling of different cell lines and tissue types was followed by (optional) protein fractionation, proteolytic digestion (using trypsin or lys-c/trypsin using PCT), (optional) peptide fractionation and LC-MS/MS analysis in discovery proteomics mode. (b) Data analysis: Sequence database search was conducted using four different search engines and the results were statistically evaluated and combined using the Trans-Proteomic Pipeline. False discovery rate (FDR) control was conducted using MAYU. The identified peptides were used to generate a consensus, RT normalized spectral library using SpectraST. Assays were selected using spectrast2tsv.py and the OpenSWATH tool ConvertTSVToTraML.
Figure 2
Figure 2
Statistics of the combined assay library and comparison to other human proteome mapping efforts. (a) True positive (red) and all protein identifications (blue) as a function of protein FDR. The graph indicates that the number of true positive protein identifications saturates at a protein FDR cutoff of 0.05. Additional identifications at less strict FDR cutoffs are mainly false positive protein identifications. (b) True positive (red) and all peptides identifications (blue) as a function of protein FDR. The graph indicates that the number of true positive peptide identifications correlates strongly with the total number of peptide identifications and does not reach saturation within typical levels of protein FDR cutoffs. (c) The number of PSM per sample type contributed to the assay library. Multiple PSM can constitute a consensus spectrum and are individually counted per MS injection. The NCI60 cell line panel contributed most, and HEK293 cells, gut tissue and THP1 cells each contributed to more than 10% of all spectra. (d) Overlap of human proteins curated by UniProtKB/Swiss-Prot, a subset annotated with protein-level evidence and the presented combined assay library (CAL). On the protein level, the assay library provides 68.2% coverage of the proteins with evidence while providing assays for an additional 802 proteins. Compared to UniProtKB/Swiss-Prot, the assay library contains 50.9% of all 20,264 proteins.
Figure 3
Figure 3
Number of peptide and protein identifications by SWATH-MS using different proteotypic assay libraries. (a) The proteotypic peptides contained in the combined assay library (CAL) and the sample–specific (ss) assay libraries and their overlap is depicted. The overlap on peptide-level between the sample-specific libraries is more than 70% and around 80% on protein-level. 239 peptides contained in the sample-specific libraries were not included in the CAL, since they did not meet the stricter quality cutoff of the CAL. (b) The number of true positive peptide identifications in dependency of the peptide FDR is depicted. Using the combined library, the number of true positive peptide identifications matches the sample-specific libraries at peptide FDR below 1% (dashed grey line). (c,d) The number of true positive protein identifications of a HeLa (c) or U2OS (d) whole cell lysate in a single, unfractionated injection in dependency of the protein FDR is depicted. Protein FDR cutoffs are either reported for all identifications or non-single hits (NS). The CAL provides similar sensitivity compared to the sample-specific libraries for HeLa and U2OS at typical levels of error-rate control. The non-single hit identifications of the CAL generally provide a higher sensitivity at lower protein FDR cutoffs. The dashed grey line indicates the protein FDR cutoff at 1%. (e) Reproducibility of the peptide identifications in dependency of the peptide FDR. The colors indicate reproducibility in 1 (green), 2 (blue) or 3 (red) of 3 technical replicates. Both ss HeLa (top) and CAL (bottom) enable detection of a similar number of assays among all replicates at the same peptide FDR. The CAL enables detection of more low intensity peptides in only one or two replicates. (f) Distribution of the coefficient of variation (CV) of summed transition intensities of precursors identified in all three replicates at 1% peptide FDR. The median CV of 5% (U2OS) to 10% (HeLa) corresponds well with the expected technical variation and is very similar between sample-specific and the combined assay library.
Figure 4
Figure 4
Application of the combined assay library (CAL) to an independently acquired dataset (CDK4 AP-SWATH, Lambert et al.) and comparison to the sample-specific assay library (ss). The fold changes of the comparison wild type (WT) and mutants (R24C or R24H) with whiskers for standard deviation are indicated. The assays contained in the combined library for CD2A1 and CDN2C covered fewer and different peptides than the sample-specific assay library and thus the fold change is smaller. The results indicate that comparable qualitative and quantitative results using the combined assay library can be retrieved from SWATH-MS experiments conducted using different experimental setups, data acquisition and data analysis strategies.

References

Data Citations

    1. Rosenberger G. 2014. ProteomeXchange. PXD000953
    1. Rosenberger G. 2014. SWATHAtlas. SAL00016-35
    1. Rosenberger G. 2014. ProteomeXchange. PXD000954

References

    1. Uhlen M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010). - PubMed
    1. Edwards A. M. et al. Too many roads not taken. Nature 470, 163–165 (2011). - PubMed
    1. Marx V. Finding the right antibody for the job. Nat. Methods 10, 703–707 (2013). - PubMed
    1. Beck M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 1–8 (2011). - PMC - PubMed
    1. Geiger T., Wehner A., Schaab C., Cox J. & Mann M. Comparative Proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell. Proteomics 11, M111.014050 (2012). - PMC - PubMed

Publication types