Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 21;75(2):480-90.
doi: 10.1016/j.jprot.2011.08.013. Epub 2011 Aug 24.

RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Affiliations

RT-SVR+q: a strategy for post-Mascot analysis using retention time and q value metric to improve peptide and protein identifications

Weifeng Cao et al. J Proteomics. .

Abstract

Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of RT-SVR in processing proteomic data. The first step is to construct the RT-SVR model. Target PSMs (at 1%FDR) are screened (remove those with Mascot score < Mascot identify threshold) to create training and testing datasets (split with a ratio of 3:1). The second step is to apply the trained RT-SVR to Mascot results (at p=0.10) so as to filter out false positive predictions. Both target and decoy PSMs are processed with RT-SVR following by q value assessment by which confident peptide predictions are selected out at a given q value.
Figure 2
Figure 2
The Bland-Altman plot shows the distribution of RT deviation over the average of the predicted and experimental RTs. The Pearson’s correlation R value is specified at the top right. Dataset #1 was used to make graphs. a) Gaussian kernel; b) Linear kernel.
Figure 3
Figure 3
The performance and robustness comparison for different RT-SVR models and SSRC linear regressors.
Figure 4
Figure 4
The performance of RT-SVR model as a function of the number of training examples. Data are obtained from dataset #1. Each data point represents the average value of 3 replicates.
Figure 5
Figure 5
Retention contributions of some features. Shown are the weights from linear-kernel RT-SVR corresponding to 20 amino acid residues, C-terminal R or K, length and mass. a) Average retention contribution (standard deviation shown as error bars); b) Individual contributions for each dataset.
Figure 6
Figure 6
The plots of PSMs versus q value or FDR. The curves of PSMs over FDR (for MIT and MHT) are counterintuitive around 1% FDR. The plot of PSMs over q value (for RT-SVR) resolves this issue. Comparison indicates that RT-SVR outperforms MIT and MHT. The results are obtained from application dataset #2.
Figure 7
Figure 7
The comparison of PSMs identified with RT-SVR and MIT at a q value of 0.01.
Figure 8
Figure 8
The distribution of q value over RT error (Δ RT). Δ RT threshold depends on q value. Data obtained from dataset #2.
Figure 9
Figure 9
The dynamic Δ RT thresholds for all 9 datasets are determined by a specified q value of 0.01. The number on top of each bar represents the upper bound of Δ RT threshold while the lower bound not shown has the same value but negative sign.

Similar articles

References

    1. Hunt DF, Michel H, Dickinson TA, Shabanowitz J, Cox AL, Sakaguchi K, et al. Peptides presented to the immune system by the murine class II major histocompatibility complex molecule I-Ad. Science. 1992;256:1817–20. - PubMed
    1. Wolters DA, Washburn MP, Yates JR., 3rd An automated multidimensional protein identification technology for shotgun proteomics. Anal Chem. 2001;73:5683–90. - PubMed
    1. Foster LJ, de Hoog CL, Zhang Y, Xie X, Mootha VK, Mann M. A mammalian organelle map by protein correlation profiling. Cell. 2006;125:187–99. - PubMed
    1. Eng JK, Mccormack AL, Yates JR. An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr. 1994;5:976–89. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–67. - PubMed

Publication types

LinkOut - more resources