Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 24;19(Suppl 7):666.
doi: 10.1186/s12864-018-5026-x.

A graph-based filtering method for top-down mass spectral identification

Affiliations

A graph-based filtering method for top-down mass spectral identification

Runmin Yang et al. BMC Genomics. .

Abstract

Background: Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search.

Results: In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells.

Conclusions: Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.

Keywords: Filtering algorithm; Mass spectrometry; Spectrum graph.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that there are no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Spectrum graph generation. Illustration of spectrum graph generation using an example deconvoluted MS/MS spectrum of the protein LNRVSG. a In the spectrum, the mass of the N-terminal fragment LNR is missing, and there is a noise mass peak (bold) between the fragment masses of LNR and LNRV. b In the spectrum graph, each node corresponds to a peak in the spectrum. Two nodes are connected by a directed edge if the difference between their corresponding masses matches the residue mass of one amino acid; the edge is labeled with the amino acid. The sequence tag NVRS extracted from the spectrum is incorrect because of the noise mass peak and its node v2. c In the spectrum graph, each node corresponds to a peak in the spectrum. Two nodes are connected by a directed edge if the difference between their corresponding masses is less than 400 Da and matches the residue mass of one or several amino acids; the edge is labeled by the mass difference. The mass sequence of a path is a blocked pattern of the spectrum. For example, the bold path v0,v1,v3,v4 corresponds to a blocked pattern 114.04, 255.17, 87.03, which matches a correct sequence tag NRVS because 255.07 is the sum of the mass 156.10 of R and the mass 99.07 of V
Fig. 2
Fig. 2
Influence of the parameter δ on the SGM filtering algorithm. The filtration efficiency and running time of the SGM filtering algorithm are compared on the EC evaluation data set with various settings of the parameter δ from 500 to 1400 Da and fixed parameter settings α=300 Da, β=200 Da, λ= (no masses are removed in the spectral preprocessing), and the error tolerance ε=0.02 Da
Fig. 3
Fig. 3
Influence of the parameter β on the SGM filtering algorithm. The filtration efficiency and running time of the SGM filtering algorithm are compared on the EC evaluation data set with various settings of parameter β from 0 to 400 Da and fixed parameter settings δ=900, α=300, λ= (no masses are removed in the spectral preprocessing), and the error tolerance ε=0.02 Da
Fig. 4
Fig. 4
Influence of the parameter α on the SGM filtering algorithm. The filtration efficiency and running time of the SGM filtering algorithm are compared on the EC evaluation data set with various settings of parameter α from 200 to 500 Da and fixed parameter settings δ=900, β=250, λ= (no masses are removed in the spectral preprocessing), and the error tolerance ε=0.02 Da
Fig. 5
Fig. 5
Comparison of the performances TAG-1 with various settings of the tag length l. With a 1% spectrum level FDR, the numbers of proteoform spectrum-matches identified by the TAG-1 filtering algorithm with various settings of the tag length l are compared on the MCF-7 data set. The TAG-1 filtering algorithm is coupled with the spectral alignment algorithm in TopPIC for spectral identification
Fig. 6
Fig. 6
Comparison of the numbers of identifications of SGM, TAG-1, and TAG-2. With a 1% spectrum-level FDR, the numbers of proteoform spectrum-matches identified by the SGM, TAG-1, and TAG-2 filtering algorithms are compared on the MCF-7 data set. The three algorithms are coupled with spectral alignment algorithm in TopPIC for spectral identification

Similar articles

References

    1. Catherman AD, Skinner OS, Kelleher NL. Top down proteomics: facts and perspectives. Biochem Bioph Res Co. 2014;445:683–93. doi: 10.1016/j.bbrc.2014.02.041. - DOI - PMC - PubMed
    1. Roth MJ, Forbes AJ, Boyne MT, Kim Y-B, Robinson DE, Kelleher NL. Precise and parallel characterization of coding polymorphisms, alternative splicing, and modifications in human proteins by mass spectrometry. Mol Cell Proteome. 2005;4:1002–8. doi: 10.1074/mcp.M500064-MCP200. - DOI - PMC - PubMed
    1. LeDuc RD, Taylor GK, Kim Y-B, Januszyk TE, Bynum LH, Sola JV, Garavelli JS, Kelleher NL. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 2004;32:340–5. doi: 10.1093/nar/gkh447. - DOI - PMC - PubMed
    1. Zamdborg L, LeDuc RD, Glowacz KJ, Kim Y-B, Viswanathan V, Spaulding IT, Early BP, Bluhm EJ, Babai S, Kelleher NL. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 2007;35:701–6. doi: 10.1093/nar/gkm371. - DOI - PMC - PubMed
    1. Kou Q, Xun L, Liu X. TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics. 2016;32:3495–7. - PMC - PubMed

MeSH terms