Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;14(9):2331-40.
doi: 10.1074/mcp.M115.051300. Epub 2015 Jun 22.

Using Data Independent Acquisition (DIA) to Model High-responding Peptides for Targeted Proteomics Experiments

Affiliations

Using Data Independent Acquisition (DIA) to Model High-responding Peptides for Targeted Proteomics Experiments

Brian C Searle et al. Mol Cell Proteomics. 2015 Sep.

Abstract

Targeted mass spectrometry is an essential tool for detecting quantitative changes in low abundant proteins throughout the proteome. Although selected reaction monitoring (SRM) is the preferred method for quantifying peptides in complex samples, the process of designing SRM assays is laborious. Peptides have widely varying signal responses dictated by sequence-specific physiochemical properties; one major challenge is in selecting representative peptides to target as a proxy for protein abundance. Here we present PREGO, a software tool that predicts high-responding peptides for SRM experiments. PREGO predicts peptide responses with an artificial neural network trained using 11 minimally redundant, maximally relevant properties. Crucial to its success, PREGO is trained using fragment ion intensities of equimolar synthetic peptides extracted from data independent acquisition experiments. Because of similarities in instrumentation and the nature of data collection, relative peptide responses from data independent acquisition experiments are a suitable substitute for SRM experiments because they both make quantitative measurements from integrated fragment ion chromatograms. Using an SRM experiment containing 12,973 peptides from 724 synthetic proteins, PREGO exhibits a 40-85% improvement over previously published approaches at selecting high-responding peptides. These results also represent a dramatic improvement over the rules-based peptide selection approaches commonly used in the literature.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A histogram of the dynamic ranges calculated for 724 proteins. The dynamic range is estimated as the number of orders of magnitude separation for each protein. This value is calculated as the difference between the log10 intensities of the highest responding peptide and the lowest responding peptide. The median dynamic range is 3.4 orders of magnitude, with an interquartile range of 1.2 orders. All protein intensity data was drawn from the Stergachis et al. SRM testing data set.
Fig. 2.
Fig. 2.
Algorithmic outline of the PREGO method. A, Algorithmic outline describing feature selection using an mRMR style algorithm to identify nonredundant features with maximum relevance. Feature sets with low redundancy often decrease the potential for over-training in machine learning algorithms. B, Algorithmic outline for neural network construction using the mRMR-selected feature set. C, Testing of the algorithm was performed using the Stergachis et al. SRM testing data set.
Fig. 3.
Fig. 3.
PREGO Scores for peptides in CASZ1. Peptides in CASZ1 (also known as cDNA FLJ20321) are ranked on their experimentally acquired transition fragment intensity from the Stergachis et al. SRM testing data set where the peptide with the strongest response is awarded a rank of one. The top 20% of peptides by intensity rank are considered “high-responding peptides” and are shaded in blue. The top five peptides chosen by PREGO are marked with red borders. Although there is large variation in predicting response intensities for any given peptide (solid line), there is a definite trend (dashed line) to score first ranked peptides somewhat higher than worse ranked peptides. Consequently, the highest scoring peptides picked by PREGO are often also high-responding peptides. CASZ1 represents a “typical” protein with a correlation score of 0.65.
Fig. 4.
Fig. 4.
Score distributions for four scoring methods by peptide rank. A, The PREGO score distribution for peptides of descending rank across the entire Stergachis et al. SRM testing data set. The median ranks are annotated as dots, where the nearest-neighbor-smoothed trend is plotted as a black line. The interquartile range (Q1 to Q3) is shaded blue. In general, first ranked peptides with the highest responses tend to get higher scores than those of lower ranks, as indicated by the downward trend from left to right. The B, PPA score distribution as well as the CONSeQuence; C, artificial neural network (ANN); and D, support vector machine (SVM) score distributions all show weaker downward trends.
Fig. 5.
Fig. 5.
Percentage of proteins with at least one high-responding peptide, given N peptides picked. A, PREGO (blue), PPA (red), CONSeQuence artificial neural network (ANN, orange), and support vector machine (SVM, purple) machine learning-based scorers are compared with randomly guessing to select peptides (green) and the simple scoring function described in Equation 2 (cyan) based on common rules in the literature. Scorers are graded based on the likelihood that for any given protein, they could predict at least one high-responding peptide given N guesses. This is analogous to the strategy of picking N peptides to produce at least one useful peptide for each protein. For example, in Fig. 3 the top 1–5 peptides picked in CASZ1 have red borders and the high-responding peptides are shaded in blue. B, The same four learning-based scorers as a percentage improvement over rules based peptide selection. PREGO is dramatically better than the other approaches tested here at predicting high-responding peptides given five or fewer chances. All scoring data is based on the Stergachis et al. SRM testing data set.

Similar articles

Cited by

References

    1. Marx V. (2013) Targeted proteomics. Nat. Methods 10, 19–22 - PubMed
    1. Liebler D. C., Zimmerman L. J. (2013) Targeted quantitation of proteins by mass spectrometry. Biochemistry 52, 3797–3806 - PMC - PubMed
    1. Picotti P., Aebersold R. (2012) Selected reaction monitoring-based proteomics: workflows, potential, pitfalls, and future directions. Nat. Methods 9, 555–566 - PubMed
    1. Stergachis A. B., MacLean B., Lee K., Stamatoyannopoulos J. A., MacCoss M. J. (2011) Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 1041–1043 - PMC - PubMed
    1. Bereman M. S., MacLean B., Tomazela D. M., Liebler D. C., MacCoss M.J. (2012) The development of selected reaction monitoring methods for targeted proteomics via empirical refinement. Proteomics 12, 1134–1141 - PMC - PubMed

Publication types

LinkOut - more resources