. 2019 Jul 30;10(1):3404.

doi: 10.1038/s41467-019-11337-z.

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Zhen-Lin Chen^{1

2}, Jia-Ming Meng^{1

2}, Yong Cao³, Ji-Li Yin^{1

2}, Run-Qian Fang^{1

2}, Sheng-Bo Fan^{1

2}, Chao Liu^{1

2}, Wen-Feng Zeng^{1

2}, Yue-He Ding³, Dan Tan³, Long Wu^{1

2}, Wen-Jing Zhou^{1

2}, Hao Chi^{1

2}, Rui-Xiang Sun³, Meng-Qiu Dong⁴, Si-Min He^{5

6}

Affiliations

¹ Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.
² University of Chinese Academy of Sciences, Beijing, 100049, China.
³ National Institute of Biological Sciences, Beijing, 102206, China.
⁴ National Institute of Biological Sciences, Beijing, 102206, China. dongmengqiu@nibs.ac.cn.
⁵ Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China. smhe@ict.ac.cn.
⁶ University of Chinese Academy of Sciences, Beijing, 100049, China. smhe@ict.ac.cn.

PMID: 31363125
PMCID: PMC6667459
DOI: 10.1038/s41467-019-11337-z

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Zhen-Lin Chen et al. Nat Commun. 2019.

. 2019 Jul 30;10(1):3404.

doi: 10.1038/s41467-019-11337-z.

Authors

Affiliations

¹ Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.
² University of Chinese Academy of Sciences, Beijing, 100049, China.
³ National Institute of Biological Sciences, Beijing, 102206, China.
⁴ National Institute of Biological Sciences, Beijing, 102206, China. dongmengqiu@nibs.ac.cn.
⁵ Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China. smhe@ict.ac.cn.
⁶ University of Chinese Academy of Sciences, Beijing, 100049, China. smhe@ict.ac.cn.

PMID: 31363125
PMCID: PMC6667459
DOI: 10.1038/s41467-019-11337-z

Abstract

We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, ¹⁵N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
pLink 2 workflow. a The general workflow. Step 1, MS1 scans are preprocessed by pParse to extract precursor candidates. Step 2, for each MS2 spectrum, α-peptide candidates are retrieved from the fragment index using query peaks generated from the spectrum. Step 3, β-peptide candidates are retrieved from the peptide index using the complementary masses of α-peptides. Step 4, α- and β-peptide candidates are paired and fine-scored with the MS2 spectrum. Step 5, all top scored PSMs are re-ranked and filtered after FDR control. b The sub-workflow of α-peptide retrieval. For each MS2 spectrum, the peaks are converted into regular b, y ions to query the fragment index. Only those peptides with at least two matched ions are coarse-scored with the spectrum and the top-5 coarse-scored α-peptide candidates are kept. c The sub-workflow of β-peptide retrieval. For each α-peptide candidate, the open mass is first calculated by subtracting the α-peptide mass and the cross-linker mass from the precursor mass, and this mass is used to retrieve β-peptide candidates from the peptide index. Then, each of the five α-peptide candidates is paired with each of its complementary β-peptide candidates and these pairs are fine-scored with the spectrum. Finally, the highest fine-scored peptide pair is kept. d The re-ranking algorithm. PSMs are grouped into intra-protein, inter-protein, loop-linked, mono-linked, and regular groups, and a semi-supervised learning algorithm is used to re-score and re-rank them in each group

**Fig. 2**
Performance evaluation on the Synthetic-BS3 dataset. a Venn diagram for the results of Kojak, pLink 1, pLink 2, and the benchmark. A total of 904 PSMs were correctly identified consistently by the three engines; these were used to be a new and fair standard dataset. b The numbers of correctly identified PSMs by each search engine. c The percentage of correct α-peptides ranking in the top-k in the open search stage of pLink 2. d Similar to c, but for β-peptides. The “Original” database contains only the sequences of 38 synthetic peptides, “ + *E. coli*” database contains sequences from the “Original” database and the *E. coli* whole proteome database, and “ + Worm” and “ + Human” are similar to “ + *E. coli*”

**Fig. 3**
Performance evaluation on the E.coli-Leiker-¹⁵N dataset. a Experimental design of the E.coli-Leiker-¹⁵N dataset. The unlabeled and ¹⁵N metabolically labeled *E. coli* lysates were cross-linked separately, mixed at a 1:1 ratio, digested with trypsin, and analyzed by LC-MS/MS. The dataset was searched only for the unlabeled peptides using different search engines, and the identification results were passed to pQuant to quantify the intensity ratio of the ¹⁵N-labeled precursor to the unlabeled precursor. Lastly, the precision of identifications was investigated by checking the percentage of NaN-ratio PSMs and peptides. b Analyses of the identified cross-linked PSMs. c Analyses of the identified cross-linked peptide pairs. The histograms denote the total numbers of b PSMs or c peptide pairs identified by each search engine under separate FDR control of intra-protein and inter-protein results, and the curves denote the percentage of NaN-ratio b PSMs or c peptide pairs in the corresponding histograms. d For intra-protein PSMs, more results were reported under separate FDR control and its percentage of NaN ratios was slightly higher than that under global FDR control. e For inter-protein PSMs, many fewer results were reported under separate FDR control and its percentage of NaN ratios decreased, especially for pLink 1

**Fig. 4**
Performance evaluation on the SCF(FBXL3)-BS3 dataset. a A real-world protein complex sample was searched using Kojak, pLink 1, and pLink 2. A total of 850 cross-linked PSMs were identified consistently by the three engines; these were used to be a new and fair standard dataset. b The sensitivities and precisions of the three engines. “ + *E. coli*” database contains sequences from 146 target proteins and the *E. coli* whole proteome database and “ + Worm” and “ + Human” are similar to “ + *E. coli*”. pLink 1 did not finish searching against the worm or the human entrapment databases within 1 week on a single computer when five variable modifications were set

**Fig. 5**
Increased speed of pLink 2 over Kojak on the Synthetic-BS3 dataset. a pLink 2 achieved a 3.9 times speed-up when searching against the *E. coli* entrapment database. The horizontal axis is the number of top-k scored single peptides kept in Kojak, starting from its default value of 250. Speed-up was measured when the sensitivity of Kojak remained steady. b, c Similar to a, but against b the worm and c the human entrapment database, respectively

See this image and copyright information in PMC

References

1. Young MM, et al. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc. Natl Acad. Sci. USA. 2000;97:5802–5806. doi: 10.1073/pnas.090099097. - DOI - PMC - PubMed
1. Sinz A. Chemical cross-linking and mass spectrometry for mapping three-dimensional structures of proteins and protein complexes. J. Mass Spectrom. 2003;38:1225–1237. doi: 10.1002/jms.559. - DOI - PubMed
1. Singh P, Panchaud A, Goodlett DR. Chemical cross-linking and mass spectrometry as a low-resolution protein structure determination technique. Anal. Chem. 2010;82:2636–2642. doi: 10.1021/ac1000724. - DOI - PubMed
1. Walzthoeni T, Leitner A, Stengel F, Aebersold R. Mass spectrometry supported determination of protein complex structure. Curr. Opin. Struct. Biol. 2013;23:252–260. doi: 10.1016/j.sbi.2013.02.008. - DOI - PubMed
1. Tang XT, Bruce JE. A new cross-linking strategy: protein interaction reporter (PIR) technology for protein-protein interaction studies. Mol. Biosyst. 2010;6:939–947. doi: 10.1039/b920876c. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Affiliations

A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources