MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Constantin Ammar¹, Markus Gruber², Gergely Csaba², Ralf Zimmer³

Affiliations

¹ ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany.
² ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany.
³ ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany. Electronic address: zimmer@ifi.lmu.de.

PMID: 31235637
PMCID: PMC6731086
DOI: 10.1074/mcp.RA119.001509

MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Constantin Ammar et al. Mol Cell Proteomics. 2019 Sep.

. 2019 Sep;18(9):1880-1892.

doi: 10.1074/mcp.RA119.001509. Epub 2019 Jun 24.

Authors

Constantin Ammar¹, Markus Gruber², Gergely Csaba², Ralf Zimmer³

Affiliations

¹ ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany.
² ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany.
³ ‡Ludwig-Maximilians-Universität München, Department of Informatics, Amalienstrasse 17, 80333 München, Germany; §Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximillians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany. Electronic address: zimmer@ifi.lmu.de.

PMID: 31235637
PMCID: PMC6731086
DOI: 10.1074/mcp.RA119.001509

Abstract

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.

Keywords: Quantification; TMT; bioinformatics; bioinformatics software; differential expression; differential quantification; label-free quantification; mass spectrometry; statistics.

PubMed Disclaimer

Figures

**Fig. 1.**
**Single linkage clustering for signal normalization.** A) We start with one cluster containing two samples (green) and three clusters of size one. We identify the two nearest clusters (gray and blue) and merge them to one new cluster by shifting the signals of the gray sample according to the median log₂ fold change to the blue sample. B) We merge the green and blue cluster. Because they both consist of multiple samples, we determine the shift parameter by computing signal fold changes between any possible inter-cluster sample pair. C) The last cluster merge step. D) The algorithm results in one cluster containing all samples. E) The two final clusters of different conditions are merged (blue and red). The resulting distribution of all inter-cluster fold changes is shown below.

**Fig. 2.**
**Schematic of the MS-EmpiRe workflow.** A) All identified peptides from a proteomics run are sorted by their mean intensities. B) The peptides are the split into subgroups based on their intensity. For each subgroup, the error fold changes of the individual peptides are calculated. An error fold change simply denotes the log₂ fold change of a peptide between two replicate conditions. All error fold changes within a subgroup form an empirical error distribution. Distributions corresponding to lower intensity peptides show a larger variance than for high intensity peptides. C) When a protein is tested for differential expression, each peptide gets assigned an empirical error distribution. Peptides of similar intensities can get the same distribution assigned. D) For each peptide fold change, the probability that this fold change happened by chance (*e.g.* the p value) is assessed from the empirical distribution. This means that the same fold change will get a much lower p value when the distribution is wide as compared with when it is narrow. To make this value manageable, the p value is then transformed to a Z-value, by transferring the mass of the empirical probability distribution to a standard normal distribution. E) The Z-values for each peptide are corrected for outliers. For this, the probability is estimated that a high Z-value on the peptide level has happened by chance because of individual outliers. F) The corrected Z-values can directly be summed to the protein level, and the corresponding protein-level p value can be obtained as well as the FDR after multiple testing correction.

**Fig. 3.**
**Experimental setup and fold change based metrics.** A, Benchmarking setup for quantitative assessment of fold changes taken from O'Connell *et al.* (17). Different amounts of yeast lysate are spiked into human cell lysate. The three groups contain 10%, 5% and 3.3% yeast lysate, respectively. B, Peptide level fold change distribution between all conditions, before normalization. C, The distribution after fold change based normalization. D, Intensity dependent peptide fold changes between two replicates of the LFQ data (error fold changes), displayed as smoothed density scatter plot. E, Error fold changes for 10 intensity regions displayed as box plots. Each box contains the same number of data points. The quantiles correspond to the fractions 0.05, 0.15, 0.50, 0.85, 0.95.

**Fig. 4.**
**Assessment of the differential detection performance on the benchmarking setup of O'Connell *et al*.** A, Number of proteins detected in the LFQ data by the MaxLFQ+t-test setup, MSqRob and MS-EmpiRe. The light shades show the numbers of yeast proteins accessible for testing in the setups, which differ for every method. As MSqRob shows high FDR rates (bottom plot), an FDR-corrected bar is introduced for MSqRob. MaxLFQ+t-test shows low sensitivity at good error rate control like MSqRob. MaxLFQ+t-test is very conservative for the fold change 1.5 setup, with no false positives (no bar visible). MS-EmpiRe increases the detection substantially in all cases with good error rate control. This is most pronounced in the most challenging fold change 1.5 setup. B, Number of proteins detected when using an intersected input set. Because of this conservative approach the numbers and differences are lower in general, nevertheless MS-EmpiRe is the best performing method. C, Comparison of MaxLFQ+t-test and MS-EmpiRe on a TMT data set. MSqRob was excluded as it currently does not support TMT data. The overall performance is better because of higher depth from sample fractionation, lower noise and fewer missing values. Quantification on the protein level hence already works well, still MS-Empire shows a significant sensitivity increase of around 19% for fold change 1.5.

**Fig. 5.**
**Application of MS-EmpiRe, MaxLFQ+t-test and MSqRob to three different quantitative LC-MS/MS data sets.** A, Number of DCPs detected in the three different data sets by the three different methods. Each bar represents the number of proteins found in the set determined by the dots below. MS-EmpiRe is the most sensitive approach. B, Overlaps of protein hits on a subset of very clear hits for DIV5 *versus* DIV15. MS-EmpiRe shows large overlaps with MaxLFQ+t-test and MSqRob, whereas the overlap between MaxLFQ+t-test and MSqRob is small. C, Investigation of the proteins called by only one method. Peptide fold change plots for the proteins with the largest FDR difference to the other two methods. A consistent shift of the boxes above or below log2 fold change 0 (black dashed line) indicates regulation. Left (MS-EmpiRe): protein D37ZPP3–2 (FDR_emp < 0.01, FDR_msqr = 0.45, FDR_mlfq = 0.83). Almost all peptides imply clear up-regulation. Middle (MSqRob): protein Q8C878 (FDR_emp = 0.95, FDR_msqr < 0.01, FDR_mlfq = 0.83). We see varying up- and down-regulation. Right (MaxLFQ+t-test): protein Q9JJL8 (FDR_emp = 0.74,FDR_msqr = 0.61, FDR_mlfq < 0.01). We see varying up- and down-regulation. D) MaxQuant protein intensities *versus* fold change plot for protein P04424 in the clinical data set (FDR_emp < 0.01, FDR_ttest = 0.99). MS-EmpiRe can clearly resolve the small but systematic fold changes. Many more validation plots for all methods tested can be found under https://www.bio.ifi.lmu.de/files/gruber/empire/.

**Fig. 6.**
**In silico benchmarking of MS-EmpiRe, MaxLFQ+t-test and MSqRob.** The x-tics represent the median fold change by which the data is shifted. The boxes contain the sensitivity/specificity results, when different fractions of the proteome are changed. Each box contains eight values corresponding to 5% up to 40% of the proteome changing in 5% steps. Sensitivity and specificity for different fold changes upon constant (left) as well as dynamic (right) proteome changes are shown. A clear dependence on the fraction of the proteome changing is visible. As in the benchmarking set, MaxLFQ+t-test shows low sensitivity with good error rate control, MSqRob shows high sensitivity but often violates the error estimation and MS-EmpiRe shows high sensitivity with good error rate control.

See this image and copyright information in PMC

References

1. Bantscheff M., Lemeer S., Savitski M. M., and Kuster B. (2012) Quantitative mass spectrometry in proteomics: Critical review update from 2007 to the present. Anal. Bioanal. Chem. 404, 939–965 - PubMed
1. Olsen J. V. (2005) Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap. Mol. Cell. Proteomics 4, 2010–2021 - PubMed
1. Ting L., Rad R., Gygi S. P., and Haas W. (2011) MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 8, 937–940 - PMC - PubMed
1. Gillet L. C., Navarro P., Tate S., Rost H., Selevsek N., Reiter L., Bonner R., and Aebersold R. (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, doi: 10.1074/mcp.O111.016717 Epub 2012 Jan 18 - DOI - PMC - PubMed
1. Bruderer R., Bernhardt O. M., Gandhi T., Xuan Y., Sondermann J., Schmidt M., Gomez-Varela D., and Reiter L. (2017) Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 41, 2296–2309 - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Affiliations

MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources