. 2013 Mar;13(5):756-65.

doi: 10.1002/pmic.201100670. Epub 2013 Feb 4.

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Chao Ji¹, Randy J Arnold, Kevin J Sokoloski, Richard W Hardy, Haixu Tang, Predrag Radivojac

Affiliations

PMID: 23303707
PMCID: PMC3733334
DOI: 10.1002/pmic.201100670

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Chao Ji et al. Proteomics. 2013 Mar.

. 2013 Mar;13(5):756-65.

doi: 10.1002/pmic.201100670. Epub 2013 Feb 4.

Authors

Chao Ji¹, Randy J Arnold, Kevin J Sokoloski, Richard W Hardy, Haixu Tang, Predrag Radivojac

Affiliation

¹ Department of Biology, Indiana University, Bloomington, IN 47405, USA.

PMID: 23303707
PMCID: PMC3733334
DOI: 10.1002/pmic.201100670

Abstract

Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.

PubMed Disclaimer

Figures

**Figure 1**
The spectral similarity of standard spectra corresponding to all pairs of doubly charged peptides of length 20. (A) Spectral similarity between standard spectra, averaged over pairs with Hamming distance d, where d ∈ {1, 2, …, 20}; (B) Histogram of all spectral similarity values.

**Figure 2**
For each target peptide p (with standard spectrum s) of length 20, the similarity scores σ (p, *p_k* ) between p and a top neighbors *p_k* (with standard spectra *s_k*) were predicted; k ∈ {1, 2, …, K}. We consider K = 11 and precursor ion charge of +2. (A) The Hamming distance between p and *p_k*, averaged over all p’s, as a function of the rank of *p_k* ’s predicted similarity. Note that because the majority of peptides do not have close neighbors, the average Hamming distance is generally high for all top-scoring peptides; (B) The number of times p and its nearest neighbor (with largest predicted similarity) have an identical amino acid at position i (match count) where i ∈ {1,2, …, 19}; position 20 is ignored because it is either K or R. Match count is summed over all p’s ; (C) The spectral similarity between s and *s_k*, averaged over all p’s, as a function of the rank of *p_k* ’s predicted similarity (solid line), or as a function of the rank of *p_k* ’s Hamming distance in ascending order (dashed line).

**Figure 3**
Comparing sensitivity of spectral library search (NIST, NIST^KNN, and NIST^MA) and sequence database search (InsPecT). The number of positive identifications plotted as a function of FDR for charge +2 (A) and +3 (B). Venn diagrams corresponding to the unique peptide identifications for charge +2 (C) and +3 (D).

**Figure 4**
Comparing sensitivity of spectral search on NIST^KNN and NIST^MA with equal search space corresponding to the 20% of the most confident predictions in NIST^KNN for (A) charge +2 and (B) charge +3 precursor ions. This resulted in 56,642 out of 283,122 peptides ions for charge +2 and 22,528 out of 112,564 peptide ions for charge +3. The same set of peptide ions were then selected from the NIST^MA for the parallel spectral search. To minimize the impact of decoy library (randomly shuffled peptide sequences) on small target libraries, we generated 50 decoy libraries using SpectraST and performed 50 independent searches in which the target library was appended with each decoy library. The numbers of positive IDs at each FDR cutoff were averaged over all 50 runs.

**Figure 5**
Comparison of spectral library searches with unequal search space. The numbers of positive identifications are plotted as a function of FDR for each of the seven libraries; the same types of lines indicate the same search space between groups of libraries. NIST and NIST^KNN are spectral libraries described in Section 2.3. Mosquito^KNN contained predicted spectra of all tryptic peptides (length 7–20 for charge +2 and 12–25 for +3) from the set of *A. aegypti* proteins in Swiss-Prot, and Mosquito^KNN-80% was a subset of Mosquito^KNN in which the predicted spectra had confidence scores greater than the 80^th percentile threshold. Mosquito^MA and Mosquito^MA-80% were counterparts of Mosquito^KNN and Mosquito^KNN-80%, respectively, but the spectra were generated using MassAnalyzer. Hybrid was a library in which NIST spectral library was combined with Mosquito^KNN; if a peptide sequence was present in both libraries, the spectrum from NIST was retained. In total, the set of *A. aegypti* peptides contained 395,213 tryptic peptides, of which 4,685 were present in NIST. Mosquito^KNN-80% and Mosquito^MA-80% each contained 65,346 peptides.

See this image and copyright information in PMC

Cited by

Impact of Amidination on Peptide Fragmentation and Identification in Shotgun Proteomics.
Li S, Dabir A, Misal SA, Tang H, Radivojac P, Reilly JP. Li S, et al. J Proteome Res. 2016 Oct 7;15(10):3656-3665. doi: 10.1021/acs.jproteome.6b00468. Epub 2016 Sep 27. J Proteome Res. 2016. PMID: 27615690 Free PMC article.
MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.
Lin YM, Chen CT, Chang JM. Lin YM, et al. BMC Genomics. 2019 Dec 24;20(Suppl 9):906. doi: 10.1186/s12864-019-6297-6. BMC Genomics. 2019. PMID: 31874640 Free PMC article.
Quantitative Comparison of Tandem Mass Spectra Obtained on Various Instruments.
Bazsó FL, Ozohanics O, Schlosser G, Ludányi K, Vékey K, Drahos L. Bazsó FL, et al. J Am Soc Mass Spectrom. 2016 Aug;27(8):1357-65. doi: 10.1007/s13361-016-1408-y. Epub 2016 May 20. J Am Soc Mass Spectrom. 2016. PMID: 27206510

References

1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. - PubMed
1. Cravatt BF, Simon GM, Yates JR., 3rd The biological impact of mass-spectrometry-based proteomics. Nature. 2007;450(7172):991–1000. - PubMed
1. Resing KA, Meyer-Arendt K, Mendoza AM, Aveline-Wolf LD, Jonscher KR, Pierce KG, Old WM, Cheung HT, Russell S, Wattawa JL, et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal Chem. 2004;76(13):3556–3568. - PubMed
1. Yen CY, Russell S, Mendoza AM, Meyer-Arendt K, Sun S, Cios KJ, Ahn NG, Resing KA. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra. Anal Chem. 2006;78(4):1071–1084. - PubMed
1. Resing KA, Ahn NG. Proteomics strategies for protein identification. FEBS Lett. 2005;579(4):885–889. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Affiliation

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources