. 2012 Jun 15;28(12):i265-73.

doi: 10.1093/bioinformatics/bts207.

Fast alignment of fragmentation trees

Franziska Hufsky¹, Kai Dührkop, Florian Rasche, Markus Chimani, Sebastian Böcker

Affiliations

PMID: 22689771
PMCID: PMC3371839
DOI: 10.1093/bioinformatics/bts207

Fast alignment of fragmentation trees

Franziska Hufsky et al. Bioinformatics. 2012.

. 2012 Jun 15;28(12):i265-73.

doi: 10.1093/bioinformatics/bts207.

Authors

Franziska Hufsky¹, Kai Dührkop, Florian Rasche, Markus Chimani, Sebastian Böcker

Affiliation

¹ Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany.

PMID: 22689771
PMCID: PMC3371839
DOI: 10.1093/bioinformatics/bts207

Abstract

Motivation: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of 'unknown' small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns.

Results: Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: a dynamic programming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for 'challenging' instances. Running times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1% slowest alignments required as much time as computing the 99% fastest.

PubMed Disclaimer

Figures

**Fig. 1.**
Optimal fragmentation tree alignment for cystine (11 losses) and methionine (6 losses) from the *Orbitrap* dataset (a). (b) Fragmentation mass spectra of cystine and methionine. The mass spectra do not share peaks. Molecular structures of cystine (c) and methionine (d). The molecular structures are not known to the alignment method. The alignment detects the common fragmentation path of formic acid–ammonia–ethylene losses and the separate ammonia branch. Additionally, it finds the methylthiol loss, which occurs at a later stage in cystine

**Fig. 2.**
Two alignments of fragmentation trees based on edge similarities. Nodes represent molecular formulas of the fragments, edges represent molecular formulas of the losses. (a) A gap (−) is introduced for the missing CO loss in the left tree (dashed edge and node). Losses CO and CH₃ are aligned by a mismatch (dotted edges). (b) In the left tree, the fragment after loosing H₃N is missing (dashed edges and node), whereas the fragment after further loss of C₂H₂ is observed. To account for missing fragments, we introduce the join operation. It allows to align the two successive losses H₃N and C₂H₂ in the right tree to a single loss C₂H₅N in the left tree (dotted edges). Fragments may be missing because the corresponding peak was not detected, for example

**Fig. 3.**
Representation of the match and the deleteL recurrences of the DP algorithm. (a) match_u,v[A, B] is the best score of matching edge ua on edge vb, such that maximally the children A of u and B of v are used. (b) deleteL_u,v[A, B] is the best score for deleting edge ua, such that maximally the children A of u and B of v are used. A subset B′⊆B of the children of v can now be matched to the children of a

**Fig. 4.**
Running times for the *Hill* dataset with 5151 individual alignments. (a) Total running times when instances are sorted by individual running times. For any fraction x%, we calculate the total running time of the x%, instances for which the alignment was computed faster than for any of the remaining instances. For example at 50% one can find the running time that was needed to compute the 50% fastest instances. For each algorithm, instances were sorted separately. Note the logarithmic y-axis. (b) Individual running times for the 200 slowest instances of the classical DP algorithm. Instances are sorted by their running time for the classical DP algorithm. One can see that running times of the classical DP are outperformed by that of the sparse DP

See this image and copyright information in PMC

Cited by

Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics.
Vaniya A, Fiehn O. Vaniya A, et al. Trends Analyt Chem. 2015 Jun 1;69:52-61. doi: 10.1016/j.trac.2015.04.002. Trends Analyt Chem. 2015. PMID: 26213431 Free PMC article.
Mass spectrometry tools and workflows for revealing microbial chemistry.
Luzzatto-Knaan T, Melnik AV, Dorrestein PC. Luzzatto-Knaan T, et al. Analyst. 2015 Aug 7;140(15):4949-66. doi: 10.1039/c5an00171d. Analyst. 2015. PMID: 25996313 Free PMC article. Review.
Computational mass spectrometry for small molecules.
Scheubert K, Hufsky F, Böcker S. Scheubert K, et al. J Cheminform. 2013 Mar 1;5(1):12. doi: 10.1186/1758-2946-5-12. J Cheminform. 2013. PMID: 23453222 Free PMC article.
Searching molecular structure databases with tandem mass spectra using CSI:FingerID.
Dührkop K, Shen H, Meusel M, Rousu J, Böcker S. Dührkop K, et al. Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21. Proc Natl Acad Sci U S A. 2015. PMID: 26392543 Free PMC article.

References

1. Arora S., et al. Proof verification and the hardness of approximation problems. J. ACM. 1998;45:501–555.
1. Backofen R., et al. Sparse RNA folding: time and space efficient algorithms. J. Discrete Algorithms. 2011;9:12–31.
1. Björklund A., et al. Proceedings of ACM Symposium on Theory of Computing (STOC 2007) New York: ACM Press; 2007. Fourier meets Möbius: fast subset convolution; pp. 67–74.
1. Böcker S., Rasche F. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics. 2008;24:I49–I55. [Proceedings of European Conference on Computational Biology(ECCB 2008)] - PubMed
1. Canzar S., et al. Proceedings of International Conference on Automata, Languages and Programming (ICALP 2011) Vol. 6755. Berlin: Springer; 2011. On tree-constrained matchings and generalizations; pp. 98–109.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fast alignment of fragmentation trees

Affiliation

Fast alignment of fragmentation trees

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources