Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep;18(9):1880-1892.
doi: 10.1074/mcp.RA119.001509. Epub 2019 Jun 24.

MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Affiliations

MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins

Constantin Ammar et al. Mol Cell Proteomics. 2019 Sep.

Abstract

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of protein expression in a wide range of biological and biomedical applications. Protein expression changes need to be reliably derived from many measured peptide intensities and their corresponding peptide fold changes. These peptide fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, whereas current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe, which explicitly accounts for the noise underlying peptide fold changes. We derive data set-specific, intensity-dependent empirical error fold change distributions, which are used for individual weighing of peptide fold changes to detect differentially expressed proteins (DEPs).In a recently published proteome-wide benchmarking data set, MS-EmpiRe doubles the number of correctly identified DEPs at an estimated FDR cutoff compared with state-of-the-art tools. We additionally confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. We apply our method to diverse MS data sets and observe consistent increases in sensitivity with more than 1000 additional significant proteins in deep data sets, including a clinical study over multiple patients. At the same time, we observe that even the proteins classified as most insignificant by other methods but significant by MS-EmpiRe show very clear regulation on the peptide intensity level. MS-EmpiRe provides rapid processing (< 2 min for 6 LC-MS/MS runs (3 h gradients)) and is publicly available under github.com/zimmerlab/MS-EmpiRe with a manual including examples.

Keywords: Quantification; TMT; bioinformatics; bioinformatics software; differential expression; differential quantification; label-free quantification; mass spectrometry; statistics.

PubMed Disclaimer

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Single linkage clustering for signal normalization. A) We start with one cluster containing two samples (green) and three clusters of size one. We identify the two nearest clusters (gray and blue) and merge them to one new cluster by shifting the signals of the gray sample according to the median log2 fold change to the blue sample. B) We merge the green and blue cluster. Because they both consist of multiple samples, we determine the shift parameter by computing signal fold changes between any possible inter-cluster sample pair. C) The last cluster merge step. D) The algorithm results in one cluster containing all samples. E) The two final clusters of different conditions are merged (blue and red). The resulting distribution of all inter-cluster fold changes is shown below.
Fig. 2.
Fig. 2.
Schematic of the MS-EmpiRe workflow. A) All identified peptides from a proteomics run are sorted by their mean intensities. B) The peptides are the split into subgroups based on their intensity. For each subgroup, the error fold changes of the individual peptides are calculated. An error fold change simply denotes the log2 fold change of a peptide between two replicate conditions. All error fold changes within a subgroup form an empirical error distribution. Distributions corresponding to lower intensity peptides show a larger variance than for high intensity peptides. C) When a protein is tested for differential expression, each peptide gets assigned an empirical error distribution. Peptides of similar intensities can get the same distribution assigned. D) For each peptide fold change, the probability that this fold change happened by chance (e.g. the p value) is assessed from the empirical distribution. This means that the same fold change will get a much lower p value when the distribution is wide as compared with when it is narrow. To make this value manageable, the p value is then transformed to a Z-value, by transferring the mass of the empirical probability distribution to a standard normal distribution. E) The Z-values for each peptide are corrected for outliers. For this, the probability is estimated that a high Z-value on the peptide level has happened by chance because of individual outliers. F) The corrected Z-values can directly be summed to the protein level, and the corresponding protein-level p value can be obtained as well as the FDR after multiple testing correction.
Fig. 3.
Fig. 3.
Experimental setup and fold change based metrics. A, Benchmarking setup for quantitative assessment of fold changes taken from O'Connell et al. (17). Different amounts of yeast lysate are spiked into human cell lysate. The three groups contain 10%, 5% and 3.3% yeast lysate, respectively. B, Peptide level fold change distribution between all conditions, before normalization. C, The distribution after fold change based normalization. D, Intensity dependent peptide fold changes between two replicates of the LFQ data (error fold changes), displayed as smoothed density scatter plot. E, Error fold changes for 10 intensity regions displayed as box plots. Each box contains the same number of data points. The quantiles correspond to the fractions 0.05, 0.15, 0.50, 0.85, 0.95.
Fig. 4.
Fig. 4.
Assessment of the differential detection performance on the benchmarking setup of O'Connell et al. A, Number of proteins detected in the LFQ data by the MaxLFQ+t-test setup, MSqRob and MS-EmpiRe. The light shades show the numbers of yeast proteins accessible for testing in the setups, which differ for every method. As MSqRob shows high FDR rates (bottom plot), an FDR-corrected bar is introduced for MSqRob. MaxLFQ+t-test shows low sensitivity at good error rate control like MSqRob. MaxLFQ+t-test is very conservative for the fold change 1.5 setup, with no false positives (no bar visible). MS-EmpiRe increases the detection substantially in all cases with good error rate control. This is most pronounced in the most challenging fold change 1.5 setup. B, Number of proteins detected when using an intersected input set. Because of this conservative approach the numbers and differences are lower in general, nevertheless MS-EmpiRe is the best performing method. C, Comparison of MaxLFQ+t-test and MS-EmpiRe on a TMT data set. MSqRob was excluded as it currently does not support TMT data. The overall performance is better because of higher depth from sample fractionation, lower noise and fewer missing values. Quantification on the protein level hence already works well, still MS-Empire shows a significant sensitivity increase of around 19% for fold change 1.5.
Fig. 5.
Fig. 5.
Application of MS-EmpiRe, MaxLFQ+t-test and MSqRob to three different quantitative LC-MS/MS data sets. A, Number of DCPs detected in the three different data sets by the three different methods. Each bar represents the number of proteins found in the set determined by the dots below. MS-EmpiRe is the most sensitive approach. B, Overlaps of protein hits on a subset of very clear hits for DIV5 versus DIV15. MS-EmpiRe shows large overlaps with MaxLFQ+t-test and MSqRob, whereas the overlap between MaxLFQ+t-test and MSqRob is small. C, Investigation of the proteins called by only one method. Peptide fold change plots for the proteins with the largest FDR difference to the other two methods. A consistent shift of the boxes above or below log2 fold change 0 (black dashed line) indicates regulation. Left (MS-EmpiRe): protein D37ZPP3–2 (FDRemp < 0.01, FDRmsqr = 0.45, FDRmlfq = 0.83). Almost all peptides imply clear up-regulation. Middle (MSqRob): protein Q8C878 (FDRemp = 0.95, FDRmsqr < 0.01, FDRmlfq = 0.83). We see varying up- and down-regulation. Right (MaxLFQ+t-test): protein Q9JJL8 (FDRemp = 0.74,FDRmsqr = 0.61, FDRmlfq < 0.01). We see varying up- and down-regulation. D) MaxQuant protein intensities versus fold change plot for protein P04424 in the clinical data set (FDRemp < 0.01, FDRttest = 0.99). MS-EmpiRe can clearly resolve the small but systematic fold changes. Many more validation plots for all methods tested can be found under https://www.bio.ifi.lmu.de/files/gruber/empire/.
Fig. 6.
Fig. 6.
In silico benchmarking of MS-EmpiRe, MaxLFQ+t-test and MSqRob. The x-tics represent the median fold change by which the data is shifted. The boxes contain the sensitivity/specificity results, when different fractions of the proteome are changed. Each box contains eight values corresponding to 5% up to 40% of the proteome changing in 5% steps. Sensitivity and specificity for different fold changes upon constant (left) as well as dynamic (right) proteome changes are shown. A clear dependence on the fraction of the proteome changing is visible. As in the benchmarking set, MaxLFQ+t-test shows low sensitivity with good error rate control, MSqRob shows high sensitivity but often violates the error estimation and MS-EmpiRe shows high sensitivity with good error rate control.

References

    1. Bantscheff M., Lemeer S., Savitski M. M., and Kuster B. (2012) Quantitative mass spectrometry in proteomics: Critical review update from 2007 to the present. Anal. Bioanal. Chem. 404, 939–965 - PubMed
    1. Olsen J. V. (2005) Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap. Mol. Cell. Proteomics 4, 2010–2021 - PubMed
    1. Ting L., Rad R., Gygi S. P., and Haas W. (2011) MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 8, 937–940 - PMC - PubMed
    1. Gillet L. C., Navarro P., Tate S., Rost H., Selevsek N., Reiter L., Bonner R., and Aebersold R. (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, doi: 10.1074/mcp.O111.016717 Epub 2012 Jan 18 - DOI - PMC - PubMed
    1. Bruderer R., Bernhardt O. M., Gandhi T., Xuan Y., Sondermann J., Schmidt M., Gomez-Varela D., and Reiter L. (2017) Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 41, 2296–2309 - PMC - PubMed

LinkOut - more resources