Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 21;13 Suppl 3(Suppl 3):S2.
doi: 10.1186/1471-2105-13-S3-S2.

Metabolic network alignment in large scale by network compression

Affiliations

Metabolic network alignment in large scale by network compression

Ferhat Ay et al. BMC Bioinformatics. .

Abstract

Metabolic network alignment is a system scale comparative analysis that discovers important similarities and differences across different metabolisms and organisms. Although the problem of aligning metabolic networks has been considered in the past, the computational complexity of the existing solutions has so far limited their use to moderately sized networks. In this paper, we address the problem of aligning two metabolic networks, particularly when both of them are too large to be dealt with using existing methods. We develop a generic framework that can significantly improve the scale of the networks that can be aligned in practical time. Our framework has three major phases, namely the compression phase, the alignment phase and the refinement phase. For the first phase, we develop an algorithm which transforms the given networks to a compressed domain where they are summarized using fewer nodes, termed supernodes, and interactions. In the second phase, we carry out the alignment in the compressed domain using an existing network alignment method as our base algorithm. This alignment results in supernode mappings in the compressed domain, each of which are smaller instances of network alignment problem. In the third phase, we solve each of the instances using the base alignment algorithm to refine the alignment results. We provide a user defined parameter to control the number of compression levels which generally determines the tradeoff between the quality of the alignment versus how fast the algorithm runs. Our experiments on the networks from KEGG pathway database demonstrate that the compression method we propose reduces the sizes of metabolic networks by almost half at each compression level which provides an expected speedup of more than an order of magnitude. We also observe that the alignments obtained by only one level of compression capture the original alignment results with high accuracy. Together, these suggest that our framework results in alignments that are comparable to existing algorithms and can do this with practical resource utilization for large scale networks that existing algorithms could not handle. As an example of our method's performance in practice, the alignment of organism-wide metabolic networks of human (1615 reactions) and mouse (1600 reactions) was performed under three minutes by only using a single level of compression.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Aligning two metabolic networks with and without compression. Top figures (a-c) illustrate the steps of alignment without compression. Bottom figures (d-g) demonstrate different phases of alignment with compression using our framework. (a) Two hypothetical metabolic networks with 5 and 4 reactions respectively. Directed edges represent the neighborhood relations between the reactions. (b) Support matrix of size 20×20 needed for the alignment if compression is not used. We only show the non-zero entries of a single row that corresponds to topological support given by b - b' mapping to possible mappings of its backward and forward neighbors. Five such mappings supported equally are denoted by 15s in the matrix, namely a - a' mapping for the backward neighbors and c - c', c - d', d - c' and d - d' mappings for the forward neighbors. (c) The resulting reaction mappings of alignment without compression. (d) Query networks shown in (a) in compressed domain after one level of compression. (e) Support matrix of size 6×6 needed for the alignment with compression. We only show the entries for the mappings supported by the a, b - a', b' mapping. (f) The resulting mappings from the alignment in compressed domain. (g) The resulting reaction mappings after refinement phase of our framework.
Figure 2
Figure 2
Shift of out-degree distributions from power law to uniform. Changes in the out-degree distributions of ten organism-wide metabolic networks with increasing levels of compression. We calculate the frequencies of each out-degree in the range [2,40] for c ∈ {0, 1, 2, 3, 4} and plot them together for each of the ten organisms in our dataset. Out-degree distributions for organism-wide metabolic networks of (a) Arabidopsis thaliana (thale cress), (b) Caenorhabditis elegans (nematode), (c) Drosophila melanogaster (fruit fly), (d) Escherichia coli K-12 MG1655, (e) Homo sapiens (human), (f) Mus musculus (mouse), (g) Pseudomonas aeruginosa PAO1, (h) Rattus norvegicus (rat), (i) Staphylococcus aureus COL (MRSA), (j) Saccharomyces cerevisiae (budding yeast).
Figure 3
Figure 3
Resource utilization of our framework. The average (a) running time and (b) memory utilization of our framework when each query network in our large scale dataset is aligned with all the networks (including itself) in the same dataset. x-axis is the query size which is calculated as the product of the sizes (i.e., number of reactions) of the metabolic networks aligned. c = 0 denote the alignments performed with no compression. c ∈ {1, 2, 3} denote the results of our framework that compresses both of the query networks by c levels before aligning them.
Figure 4
Figure 4
Gain/Loss in running time. Gain/Loss in running time of alignment by using our framework with respect to the base alignment method (x-axis) versus the ratio of the number of all possible subnetwork mappings in compressed domain to this number in the original domain. The blue vertical line shows when the two methods take exact same amount of time or when both methods take very short amount of time in the case of small query networks. Points on the right (left) handside of this line means gain (loss) in the running time. The dashed line is our decision criteria for predicting whether there will be gain or loss before doing the alignment.
Figure 5
Figure 5
One compression step of the MDS method. Small circles represent reactions and big circles represent supernodes that result from earlier steps of compression. A solid arrow represents an edge between two non-compressed nodes in the current compression level. A dashed arrow denotes an edge between a supernode and another node in the network. While calculating the degrees of the non-compressed nodes, only the solid arrows are taken into account. (a) The state of network P during compression level x before the ith intermediate step (i.e., Pi-1x). The node with the minimum degree is denoted with va and its first neighbor is denoted with vb. (b) The state of this network after the ith compression step (i.e., Pix). We denote the node resulted from the compression at this step with vab.

Similar articles

Cited by

References

    1. Navlakha S, Schatz M, Kingsford C. Revealing biological modules via graph summarization. J Comput Biol. 2009;16(2):253–264. doi: 10.1089/cmb.2008.11TT. - DOI - PubMed
    1. Segal E, Pe'er D, Regev A, Koller D, Friedman N. Learning module networks. Journal of Machine Learning Research. 2005;6:557–88.
    1. Ay F, Dinh T, Thai M, Kahveci T. Dynamic modular structure of regulatory networks. IEEE International Conference on Bioinformatics and Bioengineering (BIBE) 2010. pp. 136–143.
    1. Dutkowski J, Tiuryn J. Identification of functional modules from conserved ancestral protein protein interactions. Bioinformatics. 2007;23(13):i149–i158. doi: 10.1093/bioinformatics/btm194. - DOI - PubMed
    1. Dandekar T, Schuster S, Snel B, Huynen M, Bork P. Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochem J. 1999;343 Pt 1:115–124. doi: 10.1042/0264-6021:3430115. - DOI - PMC - PubMed

Publication types