Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 12;8(7):e67995.
doi: 10.1371/journal.pone.0067995. Print 2013.

SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks

Affiliations

SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks

Sayed Mohammad Ebrahim Sahraeian et al. PLoS One. .

Abstract

In this paper we introduce an efficient algorithm for alignment of multiple large-scale biological networks. In this scheme, we first compute a probabilistic similarity measure between nodes that belong to different networks using a semi-Markov random walk model. The estimated probabilities are further enhanced by incorporating the local and the cross-species network similarity information through the use of two different types of probabilistic consistency transformations. The transformed alignment probabilities are used to predict the alignment of multiple networks based on a greedy approach. We demonstrate that the proposed algorithm, called SMETANA, outperforms many state-of-the-art network alignment techniques, in terms of computational efficiency, alignment accuracy, and scalability. Our experiments show that SMETANA can easily align tens of genome-scale networks with thousands of nodes on a personal computer without any difficulty. The source code of SMETANA is available upon request. The source code of SMETANA can be downloaded from http://www.ece.tamu.edu/~bjyoon/SMETANA/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Performance of various network alignment algorithms.
(A) Equivalence class coverage: Number of equivalence classes in the 5-way alignment experiment that contain nodes from formula image species (formula image). (B) Node Coverage: Number of nodes (i.e. proteins) that belong to equivalence classes that contain nodes from formula image species in the 5-way alignment.
Figure 2
Figure 2. Performance of various network alignment algorithms.
(A) Equivalence class coverage: Number of equivalence classes in the 8-way alignment experiment that contain nodes from formula image species (formula image). (B) Node Coverage: Number of nodes (i.e. proteins) that belong to equivalence classes that contain nodes from formula image species in the 8-way alignment.
Figure 3
Figure 3. Number of conserved interactions (CI) and conserved orthologous interactions (COI) for different alignments.
CI reports the total number of conserved edges between any of the equivalence classes in the alignment. COI reports the total number of conserved edges between “correct” equivalence classes that consist of orthologous nodes.
Figure 4
Figure 4. Effects of node similarity on the performance of different network alignment algorithms.
The alignment performance has been estimated at several different levels of separation between the similarity score distribution for orthologous node pairs and that for non-orthologs pairs. Increasing the bias formula image increases the separation between the two score distributions, which increases the discriminative power of the node similarity score for predicting potential orthologs. Measures reported: specificity (SPE), number of correct nodes (CN) (which reflects the sensitivity), mean normalized entropy (MNE), number of conserved interactions (CI), number of conserved orthologous interactions (COI).
Figure 5
Figure 5. Total CPU time for aligning the networks.
The total CPU time for the pairwise, 5-way, and 8-way alignments. CPU time has been averaged over DMC, DMR, and CG datasets (measured in seconds).
Figure 6
Figure 6. Performance on real networks.
The trend of change in sensitivity (SEN) and specificity (SP) as the number of networks in the alignment increases for different multiple network alignment algorithms.
Figure 7
Figure 7. Conserved subnetwork regions in the 3-way alignment of D. melanogaster, H. sapiens, and S. cerevisiae using the proposed method.
(A) Transcription factor. (B) Replication factor C. (C) RNA polymerase. (D) DNA replication. (Aligned nodes are placed in the same row of alignment and connected with yellow dashed lines. In each network, the interaction in the IsoBase dataset is shown in solid lines. For D. melanogaster some edges which are missed in IsoBase but are present in STRING protein interaction database is shown in dotted gray lines).

References

    1. Zhang A (2009) Protein Interaction Networks: Computational Analysis. New York, NY, USA: Cambridge University Press, 1st edition.
    1. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5: 101–113. - PubMed
    1. Cusick ME, Klitgord N, Vidal M, Hill DE (2005) Interactome: gateway into systems biology. Hum Mol Genet 14 Spec No. 2: R171–181. - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403: 623–627. - PubMed
    1. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180–183. - PubMed

Publication types

LinkOut - more resources