. 2020 Jul;19(7):1209-1219.

doi: 10.1074/mcp.RA119.001624. Epub 2020 Apr 22.

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Adriaan Sticker¹, Ludger Goeminne¹, Lennart Martens², Lieven Clement³

Affiliations

¹ Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
² VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. Electronic address: lennart.martens@vib-ugent.be.
³ Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. Electronic address: lieven.clement@ugent.be.

PMID: 32321741
PMCID: PMC7338080
DOI: 10.1074/mcp.RA119.001624

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Adriaan Sticker et al. Mol Cell Proteomics. 2020 Jul.

. 2020 Jul;19(7):1209-1219.

doi: 10.1074/mcp.RA119.001624. Epub 2020 Apr 22.

Authors

Adriaan Sticker¹, Ludger Goeminne¹, Lennart Martens², Lieven Clement³

Affiliations

¹ Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
² VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. Electronic address: lennart.martens@vib-ugent.be.
³ Department of Applied Mathematics, Computer Science & Statistics, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. Electronic address: lieven.clement@ugent.be.

PMID: 32321741
PMCID: PMC7338080
DOI: 10.1074/mcp.RA119.001624

Abstract

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.

Keywords: Biostatistics; bioinformatics; bioinformatics software; differential expression; label-free quantification; mass spectrometry; ridge regression; shotgun proteomics; summarization.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest—Authors declare no competing interests.

Figures

**Fig. 1.**
**Comparison of current state-of-the-art tools for DE analysis of proteins.** We compare one peptide based tool, MSqRob and four summarization based tools. Of these, Perseus and Differential Enrichment analysis of Proteomics data (DEP) with mixed imputation are both based on maxLFQ protein intensities. MSstats uses median polish summarized protein intensities, whereas Proteus uses high-flyers summarization. The data consists of *E. Coli* proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot in Panel A shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = *E. Coli*/(Total *E. Coli*); False Discovery Proportion = Human/(Human + *E. Coli*)). MSstats outperforms Proteus, DEP, and Perseus at higher fold changes, but drops in performance down to Perseus levels at the lowest fold change. Proteus outperforms Perseus at higher fold changes but is less performant at the lowest fold change. MSqRob always outperforms the other methods. The boxplots in panel B show estimated log₂ fold changes of differentially (*E. Coli*) and non-differentially (human) expressed proteins in the a *versus* b comparison. The thick gray line indicates the real log₂ fold change for the *E. Coli* proteins. Perseus has biased fold changes for the *E. Coli* proteins, but has more precise fold changes for human proteins than DEP and MSstats. MSqRob has more precise and more accurate fold changes than any other method.

**Fig. 2.**
**Comparison of performance of MSqRob and MSqRobSum.** We compare the performance of MSqRob and MSqRobSum. The data consists of *E. Coli* proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = *E. Coli*/(Total *E. Coli*); False Discovery Proportion = Human/(Human + *E. Coli*)). The estimated 1% (circle) and 5% (triangle) FDR is controlled if it remains below 1 and 5% FDP, respectively (indicated by vertical gray lines). Performance of MSqRobSum is close to MSqRob in all comparisons, and MSqRobSum even outperforms MSqRob in the b-a comparison. The performance of MSqRobSum does decline compared with MSqRob at decreasing fold changes between treatments (*e.g.* c-b and d-c), but the FDR is controlled in all comparisons.

**Fig. 3.**
**Improvements of DE analysis using a modular data analysis workflow.** We show incremental improvements in DE analysis by incrementally changing components in the workflow. The data consists of *E. Coli* proteins spiked at four different concentrations (a, b, c, and d) in a human proteome. The plot shows the performance of each method for the pairwise comparisons b-a, c-b, and d-c (True Positive Rate = *E. Coli*/(Total *E. Coli*); False Discovery Proportion = Human/(Human + *E. Coli*)). The circle and triangle are at 1 and 5% FDR, respectively, as estimated by the method. Perseus default performs t-tests on maxLFQ protein summaries for DE analysis. However, its performance is low and FDR is not controlled. Adding VSN normalization to the protein summaries boosts the performance of the DE analysis (perseus vsn). This workflow is further improved by replacing conventional t-tests by MSqRobSum's inference step (MSqRobSum maxLFQ). Adopting DEP's mixed imputation scheme results in an additional gain in performance (MSqRobSum DEP), whereas the best results are obtained by replacing maxLFQ and mixed imputation with our robust summarization (MSqRobSum default).

**Fig. 4.**
**Comparison of different tools for DE analysis of proteins on the Latosinka dataset.** We compare MSqRob, MSqRobSum, Differential Enrichment analysis of Proteomics data (DEP), and MSstats. The plot shows the number of proteins that are returned by each method at a certain FDR level. The two vertical gray lines indicate the 1 and 5% FDR level. MSqRob is the methods that the largest number of proteins as DE. The DEP analysis returns more proteins than MSqRobSum at 1% FDR, and an equal number of proteins at 5% FDR; but MSqRobSum always returns proteins at higher FDR levels. MSstats has the lowest sensitivity.

**Fig. 5.**
**Comparison of MSqRob and MSqRobSum for DE analysis of proteins on the Francisella dataset.** We compare the results of the original analysis by Ramond *et al.* (2015), MSqRob and MSqRobSum. Ramond *et al.* return the largest number of proteins as DE and MSqRobSum returns the lowest number of DE proteins.

See this image and copyright information in PMC

References

1. Goeminne L. J. E., Gevaert K., and Clement L. (2018) Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J. Proteomics 171, 23–36 - PubMed
1. Tebbe A., Klammer M., Sighart S., Schaab C., and Daub H. (2015) Systematic evaluation of label-free and super-SILAC quantification for proteome expression analysis. Rapid Commun. Mass Spectrom. 29, 795–801 - PubMed
1. Tu C., Li J., Sheng Q., Zhang M., and Qu J. (2014) Systematic assessment of survey scan and MS2-based abundance strategies for label-free quantitative proteomics using high-resolution MS data. J. Proteome Res. 13, 2069–2079 - PMC - PubMed
1. Lazar C., Gatto L., Ferro M., Bruley C., and Burger T. (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res. 15, 1116–1125 - PubMed
1. Goeminne L.J.E., Argentini A., Martens L., and Clement L. (2015) Summarization vs peptide-based models in label-free quantitative proteomics: performance, pitfalls, and data analysis guidelines. J. Proteome Res. 14, 2457–2465 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Affiliations

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources