Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul;19(7):1209-1219.
doi: 10.1074/mcp.RA119.001624. Epub 2020 Apr 22.

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Affiliations

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Adriaan Sticker et al. Mol Cell Proteomics. 2020 Jul.

Abstract

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.

Keywords: Biostatistics; bioinformatics; bioinformatics software; differential expression; label-free quantification; mass spectrometry; ridge regression; shotgun proteomics; summarization.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest—Authors declare no competing interests.

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Comparison of current state-of-the-art tools for DE analysis of proteins. We compare one peptide based tool, MSqRob and four summarization based tools. Of these, Perseus and Differential Enrichment analysis of Proteomics data (DEP) with mixed imputation are both based on maxLFQ protein intensities. MSstats uses median polish summarized protein intensities, whereas Proteus uses high-flyers summarization. The data consists of E. Coli proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot in Panel A shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = E. Coli/(Total E. Coli); False Discovery Proportion = Human/(Human + E. Coli)). MSstats outperforms Proteus, DEP, and Perseus at higher fold changes, but drops in performance down to Perseus levels at the lowest fold change. Proteus outperforms Perseus at higher fold changes but is less performant at the lowest fold change. MSqRob always outperforms the other methods. The boxplots in panel B show estimated log2 fold changes of differentially (E. Coli) and non-differentially (human) expressed proteins in the a versus b comparison. The thick gray line indicates the real log2 fold change for the E. Coli proteins. Perseus has biased fold changes for the E. Coli proteins, but has more precise fold changes for human proteins than DEP and MSstats. MSqRob has more precise and more accurate fold changes than any other method.
Fig. 2.
Fig. 2.
Comparison of performance of MSqRob and MSqRobSum. We compare the performance of MSqRob and MSqRobSum. The data consists of E. Coli proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = E. Coli/(Total E. Coli); False Discovery Proportion = Human/(Human + E. Coli)). The estimated 1% (circle) and 5% (triangle) FDR is controlled if it remains below 1 and 5% FDP, respectively (indicated by vertical gray lines). Performance of MSqRobSum is close to MSqRob in all comparisons, and MSqRobSum even outperforms MSqRob in the b-a comparison. The performance of MSqRobSum does decline compared with MSqRob at decreasing fold changes between treatments (e.g. c-b and d-c), but the FDR is controlled in all comparisons.
Fig. 3.
Fig. 3.
Improvements of DE analysis using a modular data analysis workflow. We show incremental improvements in DE analysis by incrementally changing components in the workflow. The data consists of E. Coli proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = E. Coli/(Total E. Coli); False Discovery Proportion = Human/(Human + E. Coli)). The circle and triangle are at 1 and 5% FDR, respectively, as estimated by the method. Perseus default performs t-tests on maxLFQ protein summaries for DE analysis. However, its performance is low and FDR is not controlled. Adding VSN normalization to the protein summaries boosts the performance of the DE analysis (perseus vsn). This workflow is further improved by replacing conventional t-tests by MSqRobSum's inference step (MSqRobSum maxLFQ). Adopting DEP's mixed imputation scheme results in an additional gain in performance (MSqRobSum DEP), whereas the best results are obtained by replacing maxLFQ and mixed imputation with our robust summarization (MSqRobSum default).
Fig. 4.
Fig. 4.
Comparison of different tools for DE analysis of proteins on the Latosinka dataset. We compare MSqRob, MSqRobSum, Differential Enrichment analysis of Proteomics data (DEP), and MSstats. The plot shows the number of proteins that are returned by each method at a certain FDR level. The two vertical gray lines indicate the 1 and 5% FDR level. MSqRob is the methods that the largest number of proteins as DE. The DEP analysis returns more proteins than MSqRobSum at 1% FDR, and an equal number of proteins at 5% FDR; but MSqRobSum always returns proteins at higher FDR levels. MSstats has the lowest sensitivity.
Fig. 5.
Fig. 5.
Comparison of MSqRob and MSqRobSum for DE analysis of proteins on the Francisella dataset. We compare the results of the original analysis by Ramond et al. (2015), MSqRob and MSqRobSum. Ramond et al. return the largest number of proteins as DE and MSqRobSum returns the lowest number of DE proteins.

Similar articles

Cited by

References

    1. Goeminne L. J. E., Gevaert K., and Clement L. (2018) Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J. Proteomics 171, 23–36 - PubMed
    1. Tebbe A., Klammer M., Sighart S., Schaab C., and Daub H. (2015) Systematic evaluation of label-free and super-SILAC quantification for proteome expression analysis. Rapid Commun. Mass Spectrom. 29, 795–801 - PubMed
    1. Tu C., Li J., Sheng Q., Zhang M., and Qu J. (2014) Systematic assessment of survey scan and MS2-based abundance strategies for label-free quantitative proteomics using high-resolution MS data. J. Proteome Res. 13, 2069–2079 - PMC - PubMed
    1. Lazar C., Gatto L., Ferro M., Bruley C., and Burger T. (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res. 15, 1116–1125 - PubMed
    1. Goeminne L.J.E., Argentini A., Martens L., and Clement L. (2015) Summarization vs peptide-based models in label-free quantitative proteomics: performance, pitfalls, and data analysis guidelines. J. Proteome Res. 14, 2457–2465 - PubMed

Publication types

LinkOut - more resources