. 2015 Nov 3:129:16-24.

doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

T Xu¹, S K Park², J D Venable², J A Wohlschlegel², J K Diedrich², D Cociorva², B Lu², L Liao², J Hewel², X Han², C C L Wong², B Fonslow², C Delahunty², Y Gao², H Shah², J R Yates 3rd³

Affiliations

¹ Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA; Dow AgroSciences LLC, Indianapolis, IN 46268, USA.
² Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA.
³ Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA. Electronic address: jyates@scripps.edu.

PMID: 26171723
PMCID: PMC4630125
DOI: 10.1016/j.jprot.2015.07.001

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

T Xu et al. J Proteomics. 2015.

. 2015 Nov 3:129:16-24.

doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

Authors

T Xu¹, S K Park², J D Venable², J A Wohlschlegel², J K Diedrich², D Cociorva², B Lu², L Liao², J Hewel², X Han², C C L Wong², B Fonslow², C Delahunty², Y Gao², H Shah², J R Yates 3rd³

Affiliations

¹ Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA; Dow AgroSciences LLC, Indianapolis, IN 46268, USA.
² Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA.
³ Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA. Electronic address: jyates@scripps.edu.

PMID: 26171723
PMCID: PMC4630125
DOI: 10.1016/j.jprot.2015.07.001

Abstract

ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster. This article is part of a Special Issue entitled: Computational Proteomics.

Keywords: Bioinformatics; Identification; Mass spectrometry; ProLuCID; Proteomics; Sequest.

PubMed Disclaimer

Figures

**Figure 1**
Distribution of number of fragment ion matched to a tandem mass spectrum of all candidate peptides (blue line) a protein database. The protein FASTA database contains amino acid sequences of the 17 proteins, all *Pombe* proteins and the reverse copy of each protein (10006 entries in total). The fit curve (pink line) is a binomial distribution B (22, 0.1391).

**Figure 2**
Number of correct spectrum assignments by ProLuCID and SEQUEST XCorr and Sp scores. BC for both XCorr rank and Sp rank are correct; XC for XCorr rank is correct and Sp rank is incorrect; SPC for Sp rank is correct and XCorr rank is incorrect; FP for top hits on the reverse sequences of the 17 proteins. These results are based on a 6-step MudPIT with 75866 spectra. The ProLuCID XCorr outperforms SEQUEST XCorr in terms of number of correct spectrum assignments (7299 vs 6974); The ProLuCID Sp scores (binomial probability score) work better than SEQUEST Sp scores (6353 vs 5338); and ProLuCID XCorr gives more true hits the top rank than ProLuCID Sp (7299 vs 6353).

**Figure 3**
Histogram of SEQUEST and ProLuCID XCorr scores, separated into true hitss and reverse hits, showing that the XCorr score generated by ProLuCID are more discriminative than those generated by SEQUEST, because ProLuCID closely models fragment ion isotopic distributions.

**Figure 4**
ROC curves of ProLuCID and SEQUEST scores. A. Typical ROC curves of SEQUEST XCorr, ProLuCID XCorr and ProLuCID Z score. B. Modified ROC curves, showing true positive fraction as a function of false positive rate. C. Plots of number of true hits against false positive fraction of SEQUEST XCorr, ProLuCID XCorr and ProLuCID Z score. D. Plots of number of true hits against false positive fraction of ProLuCID high mass accuracy probability score, low mass accuracy probability score and Z score.

**Figure 5**
Histograms of ProLuCID Z scores of the true hits and decoy hits, showing good separation between the true hits and decoy hits, and that the distributions of the Z scores of the decoy hits of charge +2 and charge + 3 spectra are very similar.

**Figure 6**
Plot of ProLuCID Z score as a function of false positive rate on the 17 protein mixture dataset.

**Figure 7**
An example high precursor charge (+4) peptide spectrum identified by ProLuCID.

See this image and copyright information in PMC

Cited by

Loss of MAGEL2 in Prader-Willi syndrome leads to decreased secretory granule and neuropeptide production.
Chen H, Victor AK, Klein J, Tacer KF, Tai DJ, de Esch C, Nuttle A, Temirov J, Burnett LC, Rosenbaum M, Zhang Y, Ding L, Moresco JJ, Diedrich JK, Yates JR 3rd, Tillman HS, Leibel RL, Talkowski ME, Billadeau DD, Reiter LT, Potts PR. Chen H, et al. JCI Insight. 2020 Sep 3;5(17):e138576. doi: 10.1172/jci.insight.138576. JCI Insight. 2020. PMID: 32879135 Free PMC article.
OSBPL2 mutations impair autophagy and lead to hearing loss, potentially remedied by rapamycin.
Koh YI, Oh KS, Kim JA, Noh B, Choi HJ, Joo SY, Rim JH, Kim HY, Kim DY, Yu S, Kim DH, Lee SG, Jung J, Choi JY, Gee HY. Koh YI, et al. Autophagy. 2022 Nov;18(11):2593-2614. doi: 10.1080/15548627.2022.2040891. Epub 2022 Mar 6. Autophagy. 2022. PMID: 35253614 Free PMC article.
Protein turnover models for LC-MS data of heavy water metabolic labeling.
Sadygov RG. Sadygov RG. Brief Bioinform. 2022 Mar 10;23(2):bbab598. doi: 10.1093/bib/bbab598. Brief Bioinform. 2022. PMID: 35062023 Free PMC article.
Analysis of proteome-wide degradation dynamics in ALS SOD1 iPSC-derived patient neurons reveals disrupted VCP homeostasis.
Tsioras K, Smith KC, Edassery SL, Garjani M, Li Y, Williams C, McKenna ED, Guo W, Wilen AP, Hark TJ, Marklund SL, Ostrow LW, Gilthorpe JD, Ichida JK, Kalb RG, Savas JN, Kiskinis E. Tsioras K, et al. Cell Rep. 2023 Oct 31;42(10):113160. doi: 10.1016/j.celrep.2023.113160. Epub 2023 Sep 29. Cell Rep. 2023. PMID: 37776851 Free PMC article.
Identification of IMC43, a novel IMC protein that collaborates with IMC32 to form an essential daughter bud assembly complex in Toxoplasma gondii.
Pasquarelli RR, Back PS, Sha J, Wohlschlegel JA, Bradley PJ. Pasquarelli RR, et al. PLoS Pathog. 2023 Oct 2;19(10):e1011707. doi: 10.1371/journal.ppat.1011707. eCollection 2023 Oct. PLoS Pathog. 2023. PMID: 37782662 Free PMC article.

See all "Cited by" articles

References

1. Link AJ, et al. Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol. 1999;17:676–682. doi:10.1038/10890. - PubMed
1. Washburn MP, Wolters D, Yates JR., 3rd Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19:242–247. - PubMed
1. Nesvizhskii AI. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol. 2006;367:87–120. - PubMed
1. Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR., 3rd Protein Analysis by Shotgun/Bottom-up Proteomics. Chem Rev. 2013 doi:10.1021/cr3003533. - PMC - PubMed
1. Olsen JV, et al. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol Cell Proteomics. 2005;4:2010–2021. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

Affiliations

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources