Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec;19(12):2157-2168.
doi: 10.1074/mcp.TIR120.002186. Epub 2020 Oct 16.

OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS

Affiliations

OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS

Eugen Netz et al. Mol Cell Proteomics. 2020 Dec.

Abstract

Cross-linking MS (XL-MS) has been recognized as an effective source of information about protein structures and interactions. In contrast to regular peptide identification, XL-MS has to deal with a quadratic search space, where peptides from every protein could potentially be cross-linked to any other protein. To cope with this search space, most tools apply different heuristics for search space reduction. We introduce a new open-source XL-MS database search algorithm, OpenPepXL, which offers increased sensitivity compared with other tools. OpenPepXL searches the full search space of an XL-MS experiment without using heuristics to reduce it. Because of efficient data structures and built-in parallelization OpenPepXL achieves excellent runtimes and can also be deployed on large compute clusters and cloud services while maintaining a slim memory footprint. We compared OpenPepXL to several other commonly used tools for identification of noncleavable labeled and label-free cross-linkers on a diverse set of XL-MS experiments. In our first comparison, we used a data set from a fraction of a cell lysate with a protein database of 128 targets and 128 decoys. At 5% FDR, OpenPepXL finds from 7% to over 50% more unique residue pairs (URPs) than other tools. On data sets with available high-resolution structures for cross-link validation OpenPepXL reports from 7% to over 40% more structurally validated URPs than other tools. Additionally, we used a synthetic peptide data set that allows objective validation of cross-links without relying on structural information and found that OpenPepXL reports at least 12% more validated URPs than other tools. It has been built as part of the OpenMS suite of tools and supports Windows, macOS, and Linux operating systems. OpenPepXL also supports the MzIdentML 1.2 format for XL-MS identification results. It is freely available under a three-clause BSD license at https://openms.org/openpepxl.

Keywords: Protein cross-linking; XL-MS; crosslinking; protein structure; structural biology; tandem mass spectrometry.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest—The authors declare that they have no conflicts of interest with the contents of this article.

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Overview of peptide pair candidate enumeration and identification in OpenPepXL. After in silico digestion a database of modified peptides sorted by mass is kept. For each MS2 spectrum the precursor mass (1) is used to determine the mass range for α peptides (heavier). Iterating through this list (2), for each α peptide, the mass range for β peptides is determined (3) and a list of pairs is enumerated. For each candidate pair, theoretical spectra are generated and scored against one experimental MS2 spectrum (label free experiment) or one linear-ion and one cross-linked ion spectrum (labeled cross-linker experiments).
Fig. 2.
Fig. 2.
Preprocessing of experimental spectrum pairs for experiments with labeled linkers. DSS D0/D12 is used as an example. Two experimental spectra from the same peptide pair but a different linker mass are matched without a mass shift and with a mass shift of the label mass difference considering multiple charges. The result is a linear ion spectrum with unknown charges and a cross-linked ion spectrum with known ion charges. This allows for a more constrained and targeted matching to theoretical peaks.
Fig. 3.
Fig. 3.
Results from the analysis of the ribosomal fraction data set. A, Numbers of identified unique residue pairs (URPs) in the ribosomal fraction data set with a target database of 128 proteins. OpenPepXL identified 110 URPs, pLink2 identified 102 URPs, Kojak 67 URPs and XiSearch 54 URPs. Structural verification of these cross-links is presented in supplemental Figs. S3 and S4. StavroX exceeded the available memory of 8 GB and could not finish the search. xQuest did not exceed the available memory, but the search was canceled because the projected runtime under these conditions was unreasonable. B, Runtimes in hours needed to analyze the ribosomal fraction data set with a database of 128 target and 128 decoy proteins using one CPU core. pLink2 only took 15 min. Kojak took 3 h, OpenPepXL 28 h and XiSearch 36 h.
Fig. 4.
Fig. 4.
Results from the analysis of the CRM complex and BSA data sets. A, Numbers of identified URPs in the CRM data set. Identified URPs that link residues covered by the PDB structure 3GJX were analyzed by TopoLink. The red bars are the proportion of URPs linking residues that are either not solvent accessible, or are farther away than 35 Å according to the SAS distance. The green bars are the proportion of URPs that were not covered by the structure. OpenPepXL identified 78 URPs. 44 URPs were validated and one link is inconsistent with the structure (IWS) with a distance of 37.4 Å between linked residues. Kojak identified 61 URPs of which 41 were validated. pLink2 identified 78 URPs of which 38 were validated and two are IWS, including the same 37.4 Å link as OpenPepXL and an additional IWS link with a 40.4 Å distance. StavroX identified 48 URPs of which 28 were validated and one is IWS. XiSearch found 36 URPs of which 24 were validated and xQuest found 16 of which 14 were validated. B, Numbers of identified URPs in the BSA data set. OpenPepXL and xQuest were compared on ion trap and orbitrap fragment spectra data with two different labeled linkers DSS-d0/d12 and PDH-d0/d10. Identified cross-links that link residues covered by the PDB structure 4F5S were analyzed by TopoLink. The red bars are the proportion of URPs linking residues that are either not solvent accessible, or are farther away than 35 Å according to the SAS distance. OpenPepXL identified a total of 65 URPs in the DSS orbitrap data set, including three IWS links, all of them below a distance of 40 Å. It identified 22 URPs in the PDH orbitrap data set, including one IWS link with a distance of 59.3 Å. xQuest Identified 16 URPs in the PDH orbitrap data set, including 3 IWS links. It identified 21 URPs in the DSS orbitrap data set, including one IWS link. xQuest identified 9 URPs in the PDH ion trap data set and 57 URPs in the DSS ion trap data set, including one IWS link with a distance of 70.2 Å.
Fig. 5.
Fig. 5.
Cross-links mapped to a PDB structure of the CRM complex. Cross-links identified in the CRM data set with (A) OpenPepXL, (B) Kojak and (C) pLink2, mapped onto the PDB structure 3GJX. Cross-links spanning a Euclidean distance of more than 35 Å are colored red. Those spanning a smaller distance are colored blue.
Fig. 6.
Fig. 6.
Cross-links mapped to a PDB structure of BSA. Cross-links identified in the BSA data set and mapped onto chain A of PDB structure 4F5S. Cross-links spanning a Euclidean distance of more than 35 Å are colored red. Those spanning a smaller distance are colored blue. A, DSS URPs identified by OpenPepXL in the orbitrap data set. B, PDH URPs identified by OpenPepXL in the orbitrap data set. C, DSS URPs identified by xQuest in the ion trap data set. D, PDH URPs identified by xQuest in the ion trap data set.
Fig. 7.
Fig. 7.
Results from the analysis of the synthetic peptides data set at a 5% FDR cutoff. All three replicates R1, R2 and R3 are shown. The blue bars show the number of valid CSMs/cross-links and the red bars on the negative y axis show the number of false-positive identifications. All data except for OpenPepXL was taken from Beveridge et al. (25). xQuest was omitted because it was not considered in that publication. A, Number of reported CSMs. The exact numbers are in supplemental Table S4. B, Number of identified URPs. The exact numbers are in supplemental Table S5.
Fig. 8.
Fig. 8.
Visualization of Spectra with annotated matched peaks and peptide sequence coverage in TOPPView. On the right side is the table of identifications containing a description of the identified species and several match quality metrics. On the left side is the annotated spectrum with a sequence coverage indicator. A one sided arrow means the fragment starting at the marked residue and containing the rest of the peptide or peptide pair in the direction of the arrow was matched. A double arrow means fragments starting at the marked residue and containing the rest of the peptide or peptide pair in both directions were matched.

References

    1. Liu F., and Heck A. J. (2015) Interrogating the architecture of protein assemblies and protein interaction networks by cross-linking mass spectrometry. Curr. Opin. Struct. Biol. 35, 100–108 - PubMed
    1. Sinz A., Arlt C., Chorev D., and Sharon M. (2015) Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology. Protein Sci. 24, 1193–1209 - PMC - PubMed
    1. Leitner A., Faini M., Stengel F., and Aebersold R. (2016) Crosslinking and mass spectrometry: an integrated technology to understand the structure and function of molecular machines. Trends Biochem. Sci. 41, 20–32 - PubMed
    1. O'Reilly F. J., and Rappsilber J. (2018) Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 25, 1000–1008 - PubMed
    1. Chavez J. D., and Bruce J. E. (2019) Chemical cross-linking with mass spectrometry: a tool for systems structural biology. Curr. Opin. Chem. Biol. 48, 8–18 - PMC - PubMed

Publication types

LinkOut - more resources