Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021:20:100018.
doi: 10.1074/mcp.TIR120.002216. Epub 2020 Dec 11.

PTM-Shepherd: Analysis and Summarization of Post-Translational and Chemical Modifications From Open Search Results

Affiliations

PTM-Shepherd: Analysis and Summarization of Post-Translational and Chemical Modifications From Open Search Results

Daniel J Geiszler et al. Mol Cell Proteomics. 2021.

Abstract

Open searching has proven to be an effective strategy for identifying both known and unknown modifications in shotgun proteomics experiments. Rather than being limited to a small set of user-specified modifications, open searches identify peptides with any mass shift that may correspond to a single modification or a combination of several modifications. Here we present PTM-Shepherd, a bioinformatics tool that automates characterization of post-translational modification profiles detected in open searches based on attributes, such as amino acid localization, fragmentation spectra similarity, retention time shifts, and relative modification rates. PTM-Shepherd can also perform multiexperiment comparisons for studying changes in modification profiles, e.g., in data generated in different laboratories or under different conditions. We demonstrate how PTM-Shepherd improves the analysis of data from formalin-fixed and paraffin-embedded samples, detects extreme underalkylation of cysteine in some data sets, discovers an artifactual modification introduced during peptide synthesis, and uncovers site-specific biases in sample preparation artifacts in a multicenter proteomics profiling study.

Keywords: Open searching, PTM, Post-translational modification, Mass-tolerant search, Localization, Spectral similarity, Retention time.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest Authors declare no competing interests.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
PTM-Shepherd workflow. Data processing begins by aggregating the mass shifts across all data sets into a common histogram. Peaks are determined based on their prominence. The 500 most intense peaks in aggregate are then quantified for each data set and normalized to size. Peptides with each mass shift are iteratively rescored with the modification at each position, producing localization scores for each peptide and an aggregate localization enrichment for each mass shift. Finally, modified peptides and their unmodified counterparts are analyzed to have their pairwise cosine spectral similarity and change in retention time calculated.
Fig. 2
Fig. 2
Basic PTM-Shepherd applications.A, PTM-Shepherd identifies two peaks in close proximity for the four data sets of Tabb et al. All four data sets (Zimmerman, Nair, Nielsen, and Buthelezi) show a mixture of two Gaussian peaks about 28 Da. The consistently more intense peak is at 27.9949 formylation. Only in the Nielsen data set does dimethylation (28.0313) approach formylation's intensity. B, PTM-Shepherd identifies more failed alkylation than other common modifications such as deamidation and not-Met oxidation. C, PTM-Shepherd modification decomposition identifies six times as much failed alkylation as is identifiable based on the −57 Da mass shift alone, in total accounting for a quarter of all Cys-containing peptides. PSMs, post-translational modifications.
Fig. 3
Fig. 3
Retention time (RT) profiles for peptides with losses of H2O and NH3. Modified peptides are compared with their homologous unmodified peptides, with multiple RT changes being collapsed to their median. The effect on RT for losses of H2O (top) and NH3 (bottom) is distributed bimodally. These mass shifts are known to correspond to both in-source losses and spontaneous conversions. In-source losses should not have an effect on RT, and as such are suspected to fall within a Gaussian distribution centered at zero.
Fig. 4
Fig. 4
Analytical profiles for losses of H2O and NH3.A and B, localization profiles reveal a nonhomogeneous landscape with specific residues showing enrichment. C and D, select modifications are distinguishable from background in-source decays in their effect on retention time. E and F, similarity scores show lower profiles for C-terminal modifications on Lys, whereas N-terminal modifications on Glu, Cys, and Gln have higher similarity.
Fig. 5
Fig. 5
Clustered heat map representation of CPTAC3 quality control samples transposed from PTM-Shepherd output. Values shown are column-wise z-scores of spectral counts. Column clustering shows highly related modifications, and row clustering shows experiments clustering by processing location. Mass shift clusters discussed in the text are numbered, and their corresponding mass shifts are shown left to right in the bottom of the figure. Samples processed longitudinally throughout their respective studies are indicated using tumor type label (LUAD, UCEC, or CCRCC). Samples with no tumor type label were processed as part of the CPTAC harmonization study. Mass shifts in cluster 2 correspond to negative mass shifts. ∗This annotation was constructed manually. ∗∗−14 Da can correspond to a large number of modifications and single-residue mutations. BI, Broad Institute; CCRCC, clear cell renal cell carcinoma; JHU, Johns Hopkins University; LUAD, lung adenocarcinoma; PNNL, Pacific Northwest National Laboratory; TMT, tandem mass tag; UCEC, uterine corpus endometrial carcinoma.

References

    1. Eng J.K., Searle B.C., Clauser K.R., Tabb D.L. A face in the crowd: recognizing peptides through database search. Mol. Cell. Proteomics. 2011;10 R111–009522. - PMC - PubMed
    1. Nesvizhskii A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics. 2010;73:2092–2123. - PMC - PubMed
    1. Chick J.M., Kolippakkam D., Nusinow D.P., Zhai B., Rad R., Huttlin E.L., Gygi S.P. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat. Biotechnol. 2015;33:743–749. - PMC - PubMed
    1. Kong A.T., Leprevost F.V., Avtonomov D.M., Mellacheruvu D., Nesvizhskii A.I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods. 2017;14:513–520. - PMC - PubMed
    1. Nesvizhskii A.I., Roos F.F., Grossmann J., Vogelzang M., Eddes J.S., Gruissem W., Baginsky S., Aebersold R. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics. 2006;5:652–670. - PubMed

Publication types

LinkOut - more resources