RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Weifeng Cao¹, Di Ma, Arvinder Kapur, Manish S Patankar, Yadi Ma, Lingjun Li

Affiliations

PMID: 21888997
PMCID: PMC3225640
DOI: 10.1016/j.jprot.2011.08.013

RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Weifeng Cao et al. J Proteomics. 2011.

. 2011 Dec 21;75(2):480-90.

doi: 10.1016/j.jprot.2011.08.013. Epub 2011 Aug 24.

Authors

Weifeng Cao¹, Di Ma, Arvinder Kapur, Manish S Patankar, Yadi Ma, Lingjun Li

Affiliation

¹ Department of Chemistry, University of Wisconsin-Madison, 777 Highland Ave., Madison, WI 53705, USA. wcao2@wisc.edu

PMID: 21888997
PMCID: PMC3225640
DOI: 10.1016/j.jprot.2011.08.013

Abstract

Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification.

PubMed Disclaimer

Figures

**Figure 1**
The workflow of RT-SVR in processing proteomic data. The first step is to construct the RT-SVR model. Target PSMs (at 1%FDR) are screened (remove those with Mascot score < Mascot identify threshold) to create training and testing datasets (split with a ratio of 3:1). The second step is to apply the trained RT-SVR to Mascot results (at p=0.10) so as to filter out false positive predictions. Both target and decoy PSMs are processed with RT-SVR following by q value assessment by which confident peptide predictions are selected out at a given q value.

**Figure 2**
The Bland-Altman plot shows the distribution of RT deviation over the average of the predicted and experimental RTs. The Pearson’s correlation R value is specified at the top right. Dataset #1 was used to make graphs. a) Gaussian kernel; b) Linear kernel.

**Figure 3**
The performance and robustness comparison for different RT-SVR models and SSRC linear regressors.

**Figure 4**
The performance of RT-SVR model as a function of the number of training examples. Data are obtained from dataset #1. Each data point represents the average value of 3 replicates.

**Figure 5**
Retention contributions of some features. Shown are the weights from linear-kernel RT-SVR corresponding to 20 amino acid residues, C-terminal R or K, length and mass. a) Average retention contribution (standard deviation shown as error bars); b) Individual contributions for each dataset.

**Figure 6**
The plots of PSMs versus q value or FDR. The curves of PSMs over FDR (for MIT and MHT) are counterintuitive around 1% FDR. The plot of PSMs over q value (for RT-SVR) resolves this issue. Comparison indicates that RT-SVR outperforms MIT and MHT. The results are obtained from application dataset #2.

**Figure 7**
The comparison of PSMs identified with RT-SVR and MIT at a q value of 0.01.

**Figure 8**
The distribution of q value over RT error (Δ RT). Δ RT threshold depends on q value. Data obtained from dataset #2.

**Figure 9**
The dynamic Δ RT thresholds for all 9 datasets are determined by a specified q value of 0.01. The number on top of each bar represents the upper bound of Δ RT threshold while the lower bound not shown has the same value but negative sign.

See this image and copyright information in PMC

References

1. Hunt DF, Michel H, Dickinson TA, Shabanowitz J, Cox AL, Sakaguchi K, et al. Peptides presented to the immune system by the murine class II major histocompatibility complex molecule I-Ad. Science. 1992;256:1817–20. - PubMed
1. Wolters DA, Washburn MP, Yates JR., 3rd An automated multidimensional protein identification technology for shotgun proteomics. Anal Chem. 2001;73:5683–90. - PubMed
1. Foster LJ, de Hoog CL, Zhang Y, Xie X, Mootha VK, Mann M. A mammalian organelle map by protein correlation profiling. Cell. 2006;125:187–99. - PubMed
1. Eng JK, Mccormack AL, Yates JR. An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr. 1994;5:976–89. - PubMed
1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–67. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Affiliation

RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Authors

Affiliation

Abstract

Figures

Similar articles

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources