. 2017 May 1;33(9):1309-1316.

doi: 10.1093/bioinformatics/btw806.

A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

Qiang Kou¹, Si Wu², Nikola Tolic³, Ljiljana Paša-Tolic³, Yunlong Liu^{4

5}, Xiaowen Liu^{1

5}

Affiliations

¹ Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.
² Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA.
³ Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA.
⁴ Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
⁵ Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

PMID: 28453668
PMCID: PMC5860502
DOI: 10.1093/bioinformatics/btw806

A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

Qiang Kou et al. Bioinformatics. 2017.

. 2017 May 1;33(9):1309-1316.

doi: 10.1093/bioinformatics/btw806.

Authors

Qiang Kou¹, Si Wu², Nikola Tolic³, Ljiljana Paša-Tolic³, Yunlong Liu^{4

5}, Xiaowen Liu^{1

5}

Affiliations

¹ Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.
² Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA.
³ Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA.
⁴ Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
⁵ Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

PMID: 28453668
PMCID: PMC5860502
DOI: 10.1093/bioinformatics/btw806

Abstract

Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem.

Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.

Availability and implementation: http://proteomics.informatics.iupui.edu/software/topmg/.

Contact: xwliu@iupui.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1**
Comparison of a complex proteoform and its corresponding reference protein sequence in the database. The proteoform has an N-terminal truncation ‘MTTSE’, an amino acid mutation from ‘R’ to ‘K’, an insertion of ‘AA’, one phosphorylated serine residue, and two modified cysteine residues with carbamidomethylation.

**Fig. 2**
Construction of mass graphs. (a) An illustration of the construction of a proteoform mass graph from a protein ARKTDAR and four variable PTMs: acetylation on K and the first R; methylation on R and K, phosphorylation on T, and dimethylation on K. Each node corresponds to a peptide bond, or the N- or C-terminus of the protein; each edge corresponds to an amino acid residue (red edges correspond to modified amino acid residues). The weight of each edge is the mass of its corresponding unmodified or modified residue (a scaling factor 1 is used to convert weights to integers). (b) An illustration of the construction of a spectral mass graph from a prefix residue mass spectrum $0, 156, 198, 326, 340, 425, 521, 707$ . The spectrum is generated from a proteoform of RKTDA with an acetylation on the R, a methylation on the K, and a phosphorylation on the T. To simplify the mass graph, masses corresponding to proteoform suffixes (C-terminal fragment masses) are not shown. The full path from the start node y₀ to the end node y₇ is aligned with the bold path from node x₁ to node x₆. The path from y₀ to y₆ and the red bold path from x₁ to x₄ are consistent.

**Fig. 3**
The algorithm for computing all the r-distance sets of a proteoform mass graph.

**Fig. 4**
The running time and percentages of correctly identified PrSMs for the 11505 test PrSMs with 5 variable PTMs each when the parameter L is set as $10, 20, \dots, 100$ .

**Fig. 5**
The percentages of correctly identified PrSMs for the test PrSMs with various numbers of variable PTMs.

**Fig. 6**
Histograms for the PrSMs reported from the first histone dataset by TopMG with L = 40 and MS-Align-E. (a) the number of matched fragment ions; (b) the number of variable PTM sites.

See this image and copyright information in PMC

Cited by

In situ mass spectrometry analysis of intact proteins and protein complexes from biological substrates.
Hale OJ, Cooper HJ. Hale OJ, et al. Biochem Soc Trans. 2020 Feb 28;48(1):317-326. doi: 10.1042/BST20190793. Biochem Soc Trans. 2020. PMID: 32010951 Free PMC article. Review.
Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations.
Chen W, Ding Z, Zang Y, Liu X. Chen W, et al. J Proteome Res. 2023 Oct 6;22(10):3178-3189. doi: 10.1021/acs.jproteome.3c00207. Epub 2023 Sep 20. J Proteome Res. 2023. PMID: 37728997 Free PMC article.
Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.
Su T, Hollas MAR, Fellers RT, Kelleher NL. Su T, et al. Annu Rev Biomed Data Sci. 2023 Aug 10;6:357-376. doi: 10.1146/annurev-biodatasci-020722-044021. Annu Rev Biomed Data Sci. 2023. PMID: 37561601 Free PMC article. Review.
A graph-based filtering method for top-down mass spectral identification.
Yang R, Zhu D. Yang R, et al. BMC Genomics. 2018 Sep 24;19(Suppl 7):666. doi: 10.1186/s12864-018-5026-x. BMC Genomics. 2018. PMID: 30255788 Free PMC article.
Identification and Quantification of Proteoforms by Mass Spectrometry.
Schaffer LV, Millikin RJ, Miller RM, Anderson LC, Fellers RT, Ge Y, Kelleher NL, LeDuc RD, Liu X, Payne SH, Sun L, Thomas PM, Tucholski T, Wang Z, Wu S, Wu Z, Yu D, Shortreed MR, Smith LM. Schaffer LV, et al. Proteomics. 2019 May;19(10):e1800361. doi: 10.1002/pmic.201800361. Proteomics. 2019. PMID: 31050378 Free PMC article. Review.

See all "Cited by" articles

References

1. Bandeira N. et al. (2007) Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. USA, 104, 6140–6145. - PMC - PubMed
1. Bhatia S. et al. (2012) Constrained de novo sequencing of conotoxins. J. Proteome Res., 11, 4191–4200. - PMC - PubMed
1. Boutet E. et al. (2016) UniProtKB/Swiss-Prot, the manually annotated section of the UniProt knowledgebase: How to use the entry view. Plant Bioinformat Methods Protocols, 23–54. - PubMed
1. Catherman A.D. et al. (2014) Top down proteomics: facts and perspectives. Biochem. Biophys. Res. Commun., 445, 683–693. - PMC - PubMed
1. Cosgrove M.S., Wolberger C. (2005) How does the histone code work?. Biochem. Cell Biol., 83, 468–476. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM118470/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

Affiliations

A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous