Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;18(2):391-405.
doi: 10.1074/mcp.RA118.000812. Epub 2018 Nov 12.

PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome

Affiliations

PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome

Zhiwu An et al. Mol Cell Proteomics. 2019 Feb.

Abstract

The open (mass tolerant) search of tandem mass spectra of peptides shows great potential in the comprehensive detection of post-translational modifications (PTMs) in shotgun proteomics. However, this search strategy has not been widely used by the community, and one bottleneck of it is the lack of appropriate algorithms for automated and reliable post-processing of the coarse and error-prone search results. Here we present PTMiner, a software tool for confident filtering and localization of modifications (mass shifts) detected in an open search. After mass-shift-grouped false discovery rate (FDR) control of peptide-spectrum matches (PSMs), PTMiner uses an empirical Bayesian method to localize modifications through iterative learning of the prior probabilities of each type of modification occurring on different amino acids. The performance of PTMiner was evaluated on three data sets, including simulated data, chemically synthesized peptide library data and modified-peptide spiked-in proteome data. The results showed that PTMiner can effectively control the PSM FDR and accurately localize the modification sites. At 1% real false localization rate (FLR), PTMiner localized 93%, 84 and 83% of the modification sites in the three data sets, respectively, far higher than two open search engines we used and an extended version of the Ascore localization algorithm. We then used PTMiner to analyze a draft map of human proteome containing 25 million spectra from 30 tissues, and confidently identified over 1.7 million modified PSMs at 1% FDR and 1% FLR, which provided a system-wide view of both known and unknown PTMs in the human proteome.

Keywords: Algorithms; Bioinformatics Software; False Discovery Rate; False Localization Rate; Human Proteome Map; Modification Site Localization; Open Search; Post-translational Modifications; Quality Control and Metrics; Statistics; Tandem Mass Spectrometry.

PubMed Disclaimer

Conflict of interest statement

We declare no conflict of interest

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Algorithm workflow of PTMiner. The processes in the red block is the localization algorithm, and the red arrows in it represent iterative updating of prior probabilities.
Fig. 2.
Fig. 2.
PTMiner confidently localized the mass shifts on the simulated data. A–B, PTMiner increased the localization proportions greatly compared with the search engines (A for pFind and B for MODa) and the extended Ascore at the same real FLR thresholds. C–D, The real and estimated FLRs were fitted using linear regressions (C for pFind and D for MODa).
Fig. 3.
Fig. 3.
PTMiner confidently localized the mass shifts on the chemically synthesized peptide data. A–B, PTMiner increased the localization proportions greatly compared with the search engines (A for pFind and B for MODa) and the extended Ascore at the same real FLR thresholds.
Fig. 4.
Fig. 4.
Comparison of PTMiner with pFind and Ascore on the modified-peptide spiked-in proteome data. “All” indicates correct PSMs (both peptide sequence and mass shift were correct) obtained at 1% transfer FDR. “PTMiner,” “Ascore,” and “pFind” indicate correct localization results given by PTMiner, Ascore and pFind at 1% FLR, respectively.
Fig. 5.
Fig. 5.
Result summary of unrestrictive modification identification in an open search and modification localization by PTMiner for the draft map of human proteome. A, The numbers of total MS/MS spectra, PSMs with 1% FDR, modified PSMs among them, and modified PSMs with 1% FLR, as well as the proportions of fully annotated (by both mass and location specificity), partially annotated (by mass only) and unannotated PSMs. B, Histogram of 1,755,278 mass shifts with 1% FDR and 1% FLR.
Fig. 6.
Fig. 6.
Modification analysis of the human proteome data. A, Modification specificity distributions on amino acids and protein/peptide termini for the fully-annotated modifications with 1% FDR and 1% FLR. Normalization was performed so that the sum of modification specificities of each modification was equal to 1. Only fully annotated modifications with more than 1,000 PSMs were shown. B, Modification distributions across the 30 human tissues. For each modification, the number of modified PSMs from each tissue was divided by the total number of the spectra from that tissue, and then the derived ratios were normalized across all tissues such that the sum for this modification was 1. These fully annotated modifications with more than 1,000 PSMs were shown in this figure. C–D, KEGG enrichment analysis results of deamidation and succinylation modifications that were differentially observed in the tissues.
Fig. 7.
Fig. 7.
The mass shift of 12.0054 Da was localized to peptide N-terminus. MS/MS spectra of modified (top) and unmodified (bottom) forms of the peptide are shown and compared. Red arrows between two spectra indicate the shift of fragment ion peaks. It can be seen that the two spectra are very similar to each other in terms of their peptide fragmentation patterns, justifying the reliability of identification and localization.
Fig. 8.
Fig. 8.
The mass shift of 34.0061 Da was localized to His. MS/MS spectra of modified (top) and unmodified (bottom) forms of the peptide are shown and compared. Red arrows between two spectra indicate the shift of fragment ion peaks. It can be seen that the two spectra are very similar to each other in terms of their peptide fragmentation patterns, justifying the reliability of identification and localization.

References

    1. Yates J. R. 3rd, Eng J. K., McCormack A. L., and Schieltz D. (1995) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 - PubMed
    1. Nesvizhskii A. I., Roos F. F., Grossmann J., Vogelzang M., Eddes J. S., Gruissem W., Baginsky S., and Aebersold R. (2006) Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell Proteomics 5, 652–670. - PubMed
    1. Chalkley R. J., Baker P. R., Medzihradszky K. F., Lynn A. J., and Burlingame A. L. (2008) In-depth analysis of tandem mass spectrometry data from disparate instrument types. Mol. Cell. Proteomics 7, 2386–2398. - PMC - PubMed
    1. Griss J., Perez-Riverol Y., Lewis S., Tabb D. L., Dianes J. A., Del-Toro N., Rurik M., Walzer M. W., Kohlbacher O., Hermjakob H., Wang R., and Vizcaíno J. A. (2016) Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat. Methods 13, 651–656. - PMC - PubMed
    1. Nielsen M. L., Savitski M. M., and Zubarev R. A. (2006) Extent of modifications in human proteome samples and their effect on dynamic range of analysis in shotgun proteomics. Mol. Cell. Proteomics 5, 2384–2391. - PubMed

Publication types

LinkOut - more resources