Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 30;10(1):3404.
doi: 10.1038/s41467-019-11337-z.

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Affiliations

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Zhen-Lin Chen et al. Nat Commun. .

Abstract

We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, 15N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
pLink 2 workflow. a The general workflow. Step 1, MS1 scans are preprocessed by pParse to extract precursor candidates. Step 2, for each MS2 spectrum, α-peptide candidates are retrieved from the fragment index using query peaks generated from the spectrum. Step 3, β-peptide candidates are retrieved from the peptide index using the complementary masses of α-peptides. Step 4, α- and β-peptide candidates are paired and fine-scored with the MS2 spectrum. Step 5, all top scored PSMs are re-ranked and filtered after FDR control. b The sub-workflow of α-peptide retrieval. For each MS2 spectrum, the peaks are converted into regular b, y ions to query the fragment index. Only those peptides with at least two matched ions are coarse-scored with the spectrum and the top-5 coarse-scored α-peptide candidates are kept. c The sub-workflow of β-peptide retrieval. For each α-peptide candidate, the open mass is first calculated by subtracting the α-peptide mass and the cross-linker mass from the precursor mass, and this mass is used to retrieve β-peptide candidates from the peptide index. Then, each of the five α-peptide candidates is paired with each of its complementary β-peptide candidates and these pairs are fine-scored with the spectrum. Finally, the highest fine-scored peptide pair is kept. d The re-ranking algorithm. PSMs are grouped into intra-protein, inter-protein, loop-linked, mono-linked, and regular groups, and a semi-supervised learning algorithm is used to re-score and re-rank them in each group
Fig. 2
Fig. 2
Performance evaluation on the Synthetic-BS3 dataset. a Venn diagram for the results of Kojak, pLink 1, pLink 2, and the benchmark. A total of 904 PSMs were correctly identified consistently by the three engines; these were used to be a new and fair standard dataset. b The numbers of correctly identified PSMs by each search engine. c The percentage of correct α-peptides ranking in the top-k in the open search stage of pLink 2. d Similar to c, but for β-peptides. The “Original” database contains only the sequences of 38 synthetic peptides, “ + E. coli” database contains sequences from the “Original” database and the E. coli whole proteome database, and “ + Worm” and “ + Human” are similar to “ + E. coli
Fig. 3
Fig. 3
Performance evaluation on the E.coli-Leiker-15N dataset. a Experimental design of the E.coli-Leiker-15N dataset. The unlabeled and 15N metabolically labeled E. coli lysates were cross-linked separately, mixed at a 1:1 ratio, digested with trypsin, and analyzed by LC-MS/MS. The dataset was searched only for the unlabeled peptides using different search engines, and the identification results were passed to pQuant to quantify the intensity ratio of the 15N-labeled precursor to the unlabeled precursor. Lastly, the precision of identifications was investigated by checking the percentage of NaN-ratio PSMs and peptides. b Analyses of the identified cross-linked PSMs. c Analyses of the identified cross-linked peptide pairs. The histograms denote the total numbers of b PSMs or c peptide pairs identified by each search engine under separate FDR control of intra-protein and inter-protein results, and the curves denote the percentage of NaN-ratio b PSMs or c peptide pairs in the corresponding histograms. d For intra-protein PSMs, more results were reported under separate FDR control and its percentage of NaN ratios was slightly higher than that under global FDR control. e For inter-protein PSMs, many fewer results were reported under separate FDR control and its percentage of NaN ratios decreased, especially for pLink 1
Fig. 4
Fig. 4
Performance evaluation on the SCF(FBXL3)-BS3 dataset. a A real-world protein complex sample was searched using Kojak, pLink 1, and pLink 2. A total of 850 cross-linked PSMs were identified consistently by the three engines; these were used to be a new and fair standard dataset. b The sensitivities and precisions of the three engines. “ + E. coli” database contains sequences from 146 target proteins and the E. coli whole proteome database and “ + Worm” and “ + Human” are similar to “ + E. coli”. pLink 1 did not finish searching against the worm or the human entrapment databases within 1 week on a single computer when five variable modifications were set
Fig. 5
Fig. 5
Increased speed of pLink 2 over Kojak on the Synthetic-BS3 dataset. a pLink 2 achieved a 3.9 times speed-up when searching against the E. coli entrapment database. The horizontal axis is the number of top-k scored single peptides kept in Kojak, starting from its default value of 250. Speed-up was measured when the sensitivity of Kojak remained steady. b, c Similar to a, but against b the worm and c the human entrapment database, respectively

Similar articles

Cited by

References

    1. Young MM, et al. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc. Natl Acad. Sci. USA. 2000;97:5802–5806. doi: 10.1073/pnas.090099097. - DOI - PMC - PubMed
    1. Sinz A. Chemical cross-linking and mass spectrometry for mapping three-dimensional structures of proteins and protein complexes. J. Mass Spectrom. 2003;38:1225–1237. doi: 10.1002/jms.559. - DOI - PubMed
    1. Singh P, Panchaud A, Goodlett DR. Chemical cross-linking and mass spectrometry as a low-resolution protein structure determination technique. Anal. Chem. 2010;82:2636–2642. doi: 10.1021/ac1000724. - DOI - PubMed
    1. Walzthoeni T, Leitner A, Stengel F, Aebersold R. Mass spectrometry supported determination of protein complex structure. Curr. Opin. Struct. Biol. 2013;23:252–260. doi: 10.1016/j.sbi.2013.02.008. - DOI - PubMed
    1. Tang XT, Bruce JE. A new cross-linking strategy: protein interaction reporter (PIR) technology for protein-protein interaction studies. Mol. Biosyst. 2010;6:939–947. doi: 10.1039/b920876c. - DOI - PMC - PubMed

Publication types