Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification

David Zimmer¹, Kevin Schneider¹, Frederik Sommer², Michael Schroda², Timo Mühlhaus¹

Affiliations

¹ Computational Systems Biology TU Kaiserslautern, Kaiserslautern, Germany.
² Molekulare Biotechnologie & Systembiologie TU Kaiserslautern, Kaiserslautern, Germany.

PMID: 30483279
PMCID: PMC6242780
DOI: 10.3389/fpls.2018.01559

Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification

David Zimmer et al. Front Plant Sci. 2018.

. 2018 Nov 13:9:1559.

doi: 10.3389/fpls.2018.01559. eCollection 2018.

Authors

David Zimmer¹, Kevin Schneider¹, Frederik Sommer², Michael Schroda², Timo Mühlhaus¹

Affiliations

¹ Computational Systems Biology TU Kaiserslautern, Kaiserslautern, Germany.
² Molekulare Biotechnologie & Systembiologie TU Kaiserslautern, Kaiserslautern, Germany.

PMID: 30483279
PMCID: PMC6242780
DOI: 10.3389/fpls.2018.01559

Abstract

Targeted mass spectrometry has become the method of choice to gain absolute quantification information of high quality, which is essential for a quantitative understanding of biological systems. However, the design of absolute protein quantification assays remains challenging due to variations in peptide observability and incomplete knowledge about factors influencing peptide detectability. Here, we present a deep learning algorithm for peptide detectability prediction, d::pPop, which allows the informed selection of synthetic proteotypic peptides for the successful design of targeted proteomics quantification assays. The deep neural network is able to learn a regression model that relates the physicochemical properties of a peptide to its ion intensity detected by mass spectrometry. The approach makes use of experimentally detected deviations from the assumed equimolar abundance of all peptides derived from a given protein. Trained on extensive proteomics datasets, d::pPop's plant and non-plant specific models can predict the quality of proteotypic peptides for not yet experimentally identified proteins. Interrogating the deep neural network after learning from ~76,000 peptides per model organism allows to investigate the impact of different physicochemical properties on the observability of a peptide, thus providing insights into peptide observability as a multifaceted process. Empirical evaluation with rank accuracy metrics showed that our prediction approach outperforms existing algorithms. We circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the need for selecting the top most promising peptides for targeting a protein of interest. Further, we used an artificial QconCAT protein to experimentally validate the observability prediction. Our proteotypic peptide prediction approach not only facilitates the design of absolute protein quantification assays via a user-friendly web interface but also enables the selection of proteotypic peptides for not yet observed proteins, hence rendering the tool especially useful for plant research.

Keywords: absolute quantification; deep learning; machine learning; mass spectrometry; peptide observability; proteotypic peptide.

PubMed Disclaimer

Figures

**Figure 1**
Schematic overview of the deep learning approach d::pPop to predict the rank of peptide observability within plant and non-plant specific query proteins. The algorithm is based on deep neural networks and is trained on experimentally observed proteins with all PTPs protein-wise normalized to the peptide with maximum intensity to match the assumption of equal molarity. The feature vectors are computed to represent the physicochemical properties of the peptide sequences. The deep neural network is able to learn a regression model that relates the physicochemical peptide properties to the difference in peptide intensities within a single protein in the proteomics workflow. The plant and non-plant specific models can predict the quality of PTPs for not yet experimentally identified proteins.

**Figure 2**
Prediction results using d::pPops non-plant model in comparison with common PTP predictors for the yeast proteome. The evaluation was performed using a yeast proteome data set consisting of 664 proteins. **(A)** According to the nDCG@4 as a measure of ranking accuracy shown with box plots for the different prediction results, all algorithms are consistently performing better than the randomized ranking of peptide queries. However, it can be observed that d::pPops ranking accuracy is higher in average compared to the other PTP predictors. **(B)** The corresponding cumulative distribution representation reflects the more accurate prediction by the line being closer to a constant nDCG@4 value of 1.

**Figure 3**
Effect of organism-specific models on PTP predictor accuracy. Comparison of d::pPop prediction results for a *Chlamydomonas reinhardtii* proteome data set using d::pPop's non-plant (orange lines) and plant (blue lines) model. The cumulative distributions of the nDCG@4 showed only a small difference between predictions on training (dashed lines) and test (solid lines) data sets. The data indicates that the models do not suffer from extensive overfitting since the performance does not differ substantially. However, it can be observed that the non-plant model generalizes imperfectly, which indicates that prediction is indeed organism-specific.

**Figure 4**
Prediction results for the *Chlamydomonas reinhardtii* proteome using d::pPop's plant model in comparison with common PTP predictors. The evaluation was performed using a *C. reinhardtii* proteome data set consisting of 685 proteins. **(A)** According to the nDCG@4 presented with box plots for the different prediction results, all algorithms showed a nDCG@4 that was consistently higher in average than the randomized ranking of peptide queries. However, it can be observed that d::pPops ranking accuracy is superior in comparison to existing PTP predictors. **(B)** The corresponding cumulative distribution representation reflected the more accurate prediction by the line being closer to a constant nDCG@4 value of 1.

**Figure 5**
QconCAT observability prediction. The prediction of the normalized intensities of the Q-peptides is in solid agreement with the measured normalized intensities, showing a Pearson correlation coefficient of 0.62.

**Figure 6**
Experimental validation of d::pPop predictions. **(A)** Exemplary comparison between d::pPop prediction results on the rbcL (*C. reinhardtii*) protein query and experimentally validated surrogate peptides (orange dots). **(A)** The d::pPop prediction suggested quantifiable peptides as the top hits. **(B)** The Q-peptide with the lowest d::pPop score showed the biggest deviation from the three other surrogate peptides, pointing to a less accurate quantification information. The complete set of peptides present on the PS-Qprot are presented in Supplemental Figures 2 and 3.

**Figure 7**
Ranking the influence of physicochemical properties on peptide observability. Net positive contribution of each input dimension (feature) of d::pPops plant (blue dots) and non-plant (orange dots) DNNs in sorted order. The plot shows the different features and their respective activation potential used for learning the models. The learned feature importance differs when learned from yeast (orange) and *C. reinhardtii* (blue) training data.

**Figure 8**
Screenshot of d::pPops Web-Interface. The screenshot shows the implementation of d::pPop as a user-friendly web interface (http://csbweb.bio.uni-kl.de/), which enables researchers to integrate the predictions in their workflow to design targeted protein quantification assays.

See this image and copyright information in PMC

References

1. Barnidge D. R., Dratz E. A., Martin T., Bonilla L. E., Moran L. B., Lindall A. (2003). Absolute quantification of the G protein-coupled receptor rhodopsin by LC/MS/MS using proteolysis product peptides and synthetic peptide standards. Anal. Chem. 75, 445–451. - PubMed
1. Bereman M. S., MacLean B., Tomazela D. M., Liebler D. C., MacCoss M. J. (2012). The development of selected reaction monitoring methods for targeted proteomics via empirical refinement. Proteomics 12, 1134–1141. 10.1002/pmic.201200042 - DOI - PMC - PubMed
1. Cech N. B., Enke C. G. (2000). Relating electrospray ionization response to nonpolar character of small peptides. Anal. Chem. 72, 2717–2723. 10.1021/ac9914869 - DOI - PubMed
1. Cedano J., Aloy P., Pérez-Pons J. A., Querol E. (1997). Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594–600. 10.1006/jmbi.1996.0804 - DOI - PubMed
1. Charton M., Charton B. I. (1983). The dependence of the Chou-Fasman parameters on amino acid side chain structure. J. Theor. Biol. 102, 121–134. 10.1016/0022-5193(83)90265-5 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification

Affiliations

Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources