Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Nov;34(11):1130-1136.
doi: 10.1038/nbt.3685. Epub 2016 Oct 3.

A multicenter study benchmarks software tools for label-free proteome quantification

Affiliations
Comparative Study

A multicenter study benchmarks software tools for label-free proteome quantification

Pedro Navarro et al. Nat Biotechnol. 2016 Nov.

Abstract

Consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra (SWATH)-MS, which uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test data sets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation-window setups. For consistent evaluation, we developed LFQbench, an R package, to calculate metrics of precision and accuracy in label-free quantitative MS and report the identification performance, robustness and specificity of each software tool. Our reference data sets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interests: S.A.T. is employed by SCIEX, O.M.B. and L.R. are employed by Biognosys AG.

Figures

Figure 1
Figure 1. Study workflow.
Two proteome-hybrid samples A and B were prepared containing known quantities of peptide digestions of human, yeast, and E.Coli organisms. The samples were analyzed in three technical replicates in SWATH-MS acquisition mode on two different MS instrument platforms (TripleTOF 5600 and TripleTOF 6600) with/using two different swath windows setups (32 fixed size windows and 64 variable size windows). This resulted in four benchmarking datasets. The datasets were analyzed in five software tools: OpenSWATH, SWATH 2.0, Skyline, Spectronaut, and DIA-Umpire. Benchmark analyses of each dataset and software tool were performed based on the output reports generated by the newly developed benchmarking software LFQbench.
Figure 2
Figure 2. Protein level LFQbench benchmark results.
After parameter optimization in a first iteration of analyses, intensities reported by each software tool were fitted to PeakView intensity scale using a linear model fixed in the origin (Supplementary Figure 25). Intensities of multiply charged precursors were summed up, and averaged across all technical replicates of each sample. Protein quantities were estimated in each technical replicate by the average of the three most intense peptides reported for each protein. Single hit proteins (a single peptide detected in a protein) were discarded. In the present figure only data derived from TripleTOF 6600 with the 64 swath window setup are displayed. Corresponding data for the other instrument and acquisition setups are shown in Supplementary Figure 8. (a) Log-transformed ratios (log2(A/B)) of proteins (human proteins in green, yeast proteins in orange, and E.Coli proteins in purple) were plotted for each benchmarked software tool over the log-transformed intensity of sample B for the first and second iteration (sample size n between 3,795 and 4,692 proteins). Dashed colored lines represent the expected log2(A/B) values for human, yeast, and E.Coli proteins. Black dashed lines represent the local trend along the x-axis of experimental log-transformed ratios of each population (human, yeast, and E.Coli). For a better understanding of these plots, see plots generated by simulated data (Supplementary Figure 2). (b) (log2(A/B)) of the averages between technical replicates of A and B for E.coli proteins in the lowest intensity tertile. Boxes represent 25% and 75% percentiles, whiskers cover data points between 1% and 99% percentiles. Accuracy could be significantly improved in the second iteration for OpenSWATH, SWATH 2.0, Skyline, and Spectronaut [p < 0.05; One-sided Wilcoxon rank sum tests]. Precision improved significantly in the second iteration for OpenSWATH, Skyline, and Spectronaut in all datasets of HYE124 [p < 0.05 in double-sided F-tests performed for each individual species].
Figure 3
Figure 3. Integrated analysis of the five software tools.
(a) Overlap of quantified peptides and proteins for library-based tools. The font size of each element is proportional to the number of peptides or proteins displayed. (b) Overlap of quantified peptides and proteins by all software tools. The font size of each element is proportional to the number of peptides or proteins displayed. An asterisk indicates protein/peptide numbers below ten. (c) Protein abundance distribution of peptides and proteins detected by DIA-Umpire. Red: peptides or proteins shared with other software tools. Turquoise: peptides or proteins detected exclusively by DIA-Umpire.
Figure 4
Figure 4. Retention time differences and correlation of reported peak intensities between all software tools for the respective matching precursors.
Retention time outliers (upper right panels) are plotted in the color of the outlier software tool (see color legend in the diagonal panels). Diagonal panels show the total number and percentage (to the total number of common detected peptides) of outliers of each respective software tool. Outliers have been defined as producing a standard deviation of the peak retention time greater than 0.2 minutes relative to all other software tools detecting that precursor, after removing ambiguous cases, in which more than one software tool produce a greater standard deviation in the peak retention time. The correlation of reported peak intensities is displayed at the lower left panels. The retention time outliers are also marked in the respective correlation plots.

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Mallick P, Kuster B. Proteomics: a pragmatic perspective. Nat Biotechnol. 2010;28:695–709. - PubMed
    1. Distler U, Kuharev J, Tenzer S. Biomedical applications of ion mobility-enhanced data-independent acquisition-based label-free quantitative proteomics. Expert Rev Proteomics. 2014;11:675–684. - PubMed
    1. Gillet LC, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics. 2012;11:O111.016717. - PMC - PubMed
    1. Geromanos SJ, Hughes C, Ciavarini S, Vissers JPC, Langridge JI. Using ion purity scores for enhancing quantitative accuracy and precision in complex proteomics samples. Anal Bioanal Chem. 2012;404:1127–1139. - PubMed