Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 15;28(12):i265-73.
doi: 10.1093/bioinformatics/bts207.

Fast alignment of fragmentation trees

Affiliations

Fast alignment of fragmentation trees

Franziska Hufsky et al. Bioinformatics. .

Abstract

Motivation: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of 'unknown' small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns.

Results: Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: a dynamic programming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for 'challenging' instances. Running times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1% slowest alignments required as much time as computing the 99% fastest.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Optimal fragmentation tree alignment for cystine (11 losses) and methionine (6 losses) from the Orbitrap dataset (a). (b) Fragmentation mass spectra of cystine and methionine. The mass spectra do not share peaks. Molecular structures of cystine (c) and methionine (d). The molecular structures are not known to the alignment method. The alignment detects the common fragmentation path of formic acid–ammonia–ethylene losses and the separate ammonia branch. Additionally, it finds the methylthiol loss, which occurs at a later stage in cystine
Fig. 2.
Fig. 2.
Two alignments of fragmentation trees based on edge similarities. Nodes represent molecular formulas of the fragments, edges represent molecular formulas of the losses. (a) A gap (−) is introduced for the missing CO loss in the left tree (dashed edge and node). Losses CO and CH3 are aligned by a mismatch (dotted edges). (b) In the left tree, the fragment after loosing H3N is missing (dashed edges and node), whereas the fragment after further loss of C2H2 is observed. To account for missing fragments, we introduce the join operation. It allows to align the two successive losses H3N and C2H2 in the right tree to a single loss C2H5N in the left tree (dotted edges). Fragments may be missing because the corresponding peak was not detected, for example
Fig. 3.
Fig. 3.
Representation of the match and the deleteL recurrences of the DP algorithm. (a) matchu,v[A, B] is the best score of matching edge ua on edge vb, such that maximally the children A of u and B of v are used. (b) deleteLu,v[A, B] is the best score for deleting edge ua, such that maximally the children A of u and B of v are used. A subset B′⊆B of the children of v can now be matched to the children of a
Fig. 4.
Fig. 4.
Running times for the Hill dataset with 5151 individual alignments. (a) Total running times when instances are sorted by individual running times. For any fraction x%, we calculate the total running time of the x%, instances for which the alignment was computed faster than for any of the remaining instances. For example at 50% one can find the running time that was needed to compute the 50% fastest instances. For each algorithm, instances were sorted separately. Note the logarithmic y-axis. (b) Individual running times for the 200 slowest instances of the classical DP algorithm. Instances are sorted by their running time for the classical DP algorithm. One can see that running times of the classical DP are outperformed by that of the sparse DP

Similar articles

Cited by

References

    1. Arora S., et al. Proof verification and the hardness of approximation problems. J. ACM. 1998;45:501–555.
    1. Backofen R., et al. Sparse RNA folding: time and space efficient algorithms. J. Discrete Algorithms. 2011;9:12–31.
    1. Björklund A., et al. Proceedings of ACM Symposium on Theory of Computing (STOC 2007) New York: ACM Press; 2007. Fourier meets Möbius: fast subset convolution; pp. 67–74.
    1. Böcker S., Rasche F. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics. 2008;24:I49–I55. [Proceedings of European Conference on Computational Biology(ECCB 2008)] - PubMed
    1. Canzar S., et al. Proceedings of International Conference on Automata, Languages and Programming (ICALP 2011) Vol. 6755. Berlin: Springer; 2011. On tree-constrained matchings and generalizations; pp. 98–109.

Publication types