Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 3:129:16-24.
doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

Affiliations

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

T Xu et al. J Proteomics. .

Abstract

ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster. This article is part of a Special Issue entitled: Computational Proteomics.

Keywords: Bioinformatics; Identification; Mass spectrometry; ProLuCID; Proteomics; Sequest.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of number of fragment ion matched to a tandem mass spectrum of all candidate peptides (blue line) a protein database. The protein FASTA database contains amino acid sequences of the 17 proteins, all Pombe proteins and the reverse copy of each protein (10006 entries in total). The fit curve (pink line) is a binomial distribution B (22, 0.1391).
Figure 2
Figure 2
Number of correct spectrum assignments by ProLuCID and SEQUEST XCorr and Sp scores. BC for both XCorr rank and Sp rank are correct; XC for XCorr rank is correct and Sp rank is incorrect; SPC for Sp rank is correct and XCorr rank is incorrect; FP for top hits on the reverse sequences of the 17 proteins. These results are based on a 6-step MudPIT with 75866 spectra. The ProLuCID XCorr outperforms SEQUEST XCorr in terms of number of correct spectrum assignments (7299 vs 6974); The ProLuCID Sp scores (binomial probability score) work better than SEQUEST Sp scores (6353 vs 5338); and ProLuCID XCorr gives more true hits the top rank than ProLuCID Sp (7299 vs 6353).
Figure 3
Figure 3
Histogram of SEQUEST and ProLuCID XCorr scores, separated into true hitss and reverse hits, showing that the XCorr score generated by ProLuCID are more discriminative than those generated by SEQUEST, because ProLuCID closely models fragment ion isotopic distributions.
Figure 4
Figure 4
ROC curves of ProLuCID and SEQUEST scores. A. Typical ROC curves of SEQUEST XCorr, ProLuCID XCorr and ProLuCID Z score. B. Modified ROC curves, showing true positive fraction as a function of false positive rate. C. Plots of number of true hits against false positive fraction of SEQUEST XCorr, ProLuCID XCorr and ProLuCID Z score. D. Plots of number of true hits against false positive fraction of ProLuCID high mass accuracy probability score, low mass accuracy probability score and Z score.
Figure 5
Figure 5
Histograms of ProLuCID Z scores of the true hits and decoy hits, showing good separation between the true hits and decoy hits, and that the distributions of the Z scores of the decoy hits of charge +2 and charge + 3 spectra are very similar.
Figure 6
Figure 6
Plot of ProLuCID Z score as a function of false positive rate on the 17 protein mixture dataset.
Figure 7
Figure 7
An example high precursor charge (+4) peptide spectrum identified by ProLuCID.

Similar articles

Cited by

References

    1. Link AJ, et al. Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol. 1999;17:676–682. doi:10.1038/10890. - PubMed
    1. Washburn MP, Wolters D, Yates JR., 3rd Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19:242–247. - PubMed
    1. Nesvizhskii AI. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol. 2006;367:87–120. - PubMed
    1. Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR., 3rd Protein Analysis by Shotgun/Bottom-up Proteomics. Chem Rev. 2013 doi:10.1021/cr3003533. - PMC - PubMed
    1. Olsen JV, et al. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol Cell Proteomics. 2005;4:2010–2021. - PubMed

Publication types