Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar;13(5):756-65.
doi: 10.1002/pmic.201100670. Epub 2013 Feb 4.

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Affiliations

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra

Chao Ji et al. Proteomics. 2013 Mar.

Abstract

Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The spectral similarity of standard spectra corresponding to all pairs of doubly charged peptides of length 20. (A) Spectral similarity between standard spectra, averaged over pairs with Hamming distance d, where d ∈ {1, 2, …, 20}; (B) Histogram of all spectral similarity values.
Figure 2
Figure 2
For each target peptide p (with standard spectrum s) of length 20, the similarity scores σ (p, pk ) between p and a top neighbors pk (with standard spectra sk) were predicted; k ∈ {1, 2, …, K}. We consider K = 11 and precursor ion charge of +2. (A) The Hamming distance between p and pk, averaged over all p’s, as a function of the rank of pk ’s predicted similarity. Note that because the majority of peptides do not have close neighbors, the average Hamming distance is generally high for all top-scoring peptides; (B) The number of times p and its nearest neighbor (with largest predicted similarity) have an identical amino acid at position i (match count) where i ∈ {1,2, …, 19}; position 20 is ignored because it is either K or R. Match count is summed over all p’s ; (C) The spectral similarity between s and sk, averaged over all p’s, as a function of the rank of pk ’s predicted similarity (solid line), or as a function of the rank of pk ’s Hamming distance in ascending order (dashed line).
Figure 3
Figure 3
Comparing sensitivity of spectral library search (NIST, NISTKNN, and NISTMA) and sequence database search (InsPecT). The number of positive identifications plotted as a function of FDR for charge +2 (A) and +3 (B). Venn diagrams corresponding to the unique peptide identifications for charge +2 (C) and +3 (D).
Figure 4
Figure 4
Comparing sensitivity of spectral search on NISTKNN and NISTMA with equal search space corresponding to the 20% of the most confident predictions in NISTKNN for (A) charge +2 and (B) charge +3 precursor ions. This resulted in 56,642 out of 283,122 peptides ions for charge +2 and 22,528 out of 112,564 peptide ions for charge +3. The same set of peptide ions were then selected from the NISTMA for the parallel spectral search. To minimize the impact of decoy library (randomly shuffled peptide sequences) on small target libraries, we generated 50 decoy libraries using SpectraST and performed 50 independent searches in which the target library was appended with each decoy library. The numbers of positive IDs at each FDR cutoff were averaged over all 50 runs.
Figure 5
Figure 5
Comparison of spectral library searches with unequal search space. The numbers of positive identifications are plotted as a function of FDR for each of the seven libraries; the same types of lines indicate the same search space between groups of libraries. NIST and NISTKNN are spectral libraries described in Section 2.3. MosquitoKNN contained predicted spectra of all tryptic peptides (length 7–20 for charge +2 and 12–25 for +3) from the set of A. aegypti proteins in Swiss-Prot, and MosquitoKNN-80% was a subset of MosquitoKNN in which the predicted spectra had confidence scores greater than the 80th percentile threshold. MosquitoMA and MosquitoMA-80% were counterparts of MosquitoKNN and MosquitoKNN-80%, respectively, but the spectra were generated using MassAnalyzer. Hybrid was a library in which NIST spectral library was combined with MosquitoKNN; if a peptide sequence was present in both libraries, the spectrum from NIST was retained. In total, the set of A. aegypti peptides contained 395,213 tryptic peptides, of which 4,685 were present in NIST. MosquitoKNN-80% and MosquitoMA-80% each contained 65,346 peptides.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. - PubMed
    1. Cravatt BF, Simon GM, Yates JR., 3rd The biological impact of mass-spectrometry-based proteomics. Nature. 2007;450(7172):991–1000. - PubMed
    1. Resing KA, Meyer-Arendt K, Mendoza AM, Aveline-Wolf LD, Jonscher KR, Pierce KG, Old WM, Cheung HT, Russell S, Wattawa JL, et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal Chem. 2004;76(13):3556–3568. - PubMed
    1. Yen CY, Russell S, Mendoza AM, Meyer-Arendt K, Sun S, Cios KJ, Ahn NG, Resing KA. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra. Anal Chem. 2006;78(4):1071–1084. - PubMed
    1. Resing KA, Ahn NG. Proteomics strategies for protein identification. FEBS Lett. 2005;579(4):885–889. - PubMed

Publication types

LinkOut - more resources