Using graph theory to analyze biological networks

Georgios A Pavlopoulos¹, Maria Secrier, Charalampos N Moschopoulos, Theodoros G Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, Pantelis G Bagos

Affiliations

PMID: 21527005
PMCID: PMC3101653
DOI: 10.1186/1756-0381-4-10

Using graph theory to analyze biological networks

Georgios A Pavlopoulos et al. BioData Min. 2011.

. 2011 Apr 28:4:10.

doi: 10.1186/1756-0381-4-10.

Authors

Georgios A Pavlopoulos¹, Maria Secrier, Charalampos N Moschopoulos, Theodoros G Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, Pantelis G Bagos

Affiliation

¹ Department of Computer Science and Biomedical Informatics, University of Central Greece, Lamia, 35100, Greece. pavlopou@embl.de.

PMID: 21527005
PMCID: PMC3101653
DOI: 10.1186/1756-0381-4-10

Abstract

Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system.

PubMed Disclaimer

Figures

**Figure 1**
**Undirected, Directed, Weighted, Bipartite graphs**. A. Undirected Graph: *V = {V₁, V₂, V₃, V₄}, |V| = 4, E = {(V₁, V₂), (V₂, V₃), (V₂, V₄), (V₄, V₁)}, |E| = 4*. B. Directed Graph: V = *{V₁, V₂, V₃, V₄}, |V| = 4, E = {(V₁, V₂), (V₂, V₃), (V₂, V₄), (V₄, V₁), (V₄, V₂)}, |E| = 5*. C. Weighted Graph: *V = {V₁, V₂, V₃, V₄}, |V| = 4, E = {(V₁, V₂, V₄), (V₂, V₃, V₂), (V₂, V₄, V₉), (V₄, V₁, V₈), (V₄, V₂, V₆)}, |E| = 5*. D. Bipartite graph: V = *{U₁, U₂, U₃, U₄, V₁, V₂, V₃}, |V| = 7, E = {(U₁, V₁), (U₂, V₁), (U₂, V₂), (U₂, V₃), (U₃, V₂), (U₄, V₂)}, |E| = 6*.

**Figure 2**
**Data structures**. A. A Directed Graph: A random graph consisting of five nodes and six directed edges. B. Adjacency List: The data structure which represents the directed graph using lists. C. Adjacency Matrix: The data structure which represents the directed graph using a 2D matrix. The zeros represent the absence of the connection whereas the ones represent the existence of the connection between two nodes. The matrix is not symmetric since the graph is directed.

**Figure 3**
**Graph Isomorphism**. *V = {V₁, V₂, V₃, V₄}, |V| = 4, E = {(V₁, V₂), (V₁, V₃), (V₁, V₄), (V₂, V₃), (V₂, V₄), (V₃, V₄)}, |E| = 6*. Graphs A and B have different topology but they are isomorphs. The graph is fully connected and every node is connected to any other so that it forms a fully connected clique.

**Figure 4**
**Walks, simple paths trails and cycles in graphs**. A *walk* is a sequence of nodes *e.g*. (*V₂, V₃, V₆, V₅, V₃*). A *simple path* is a walk with no repeated nodes, e.g. (*V₁, V₄, V₅, V₂, V₃*). A trail is a walk where no edges are repeated e.g. (*V₁, V₂, V₃V₆*). A *cycle* is a walk (*V₁, V₂,..., V_L*) where *V₁= V_L*with no other nodes repeated and L>3, e.g. (*V₁, V₂, V₅, V₄, V₁*).

**Figure 5**
**Clustering Coefficient**. A) Node V behaves like a hub but it has clustering coefficient *C = 0*. B) Node V comes with a high clustering coefficient. The maximum number of potential connection is given by *E_max=|V|(|V|-1)/2* where *|V| = 5* is the number of the neighbors of node V, thus *E_max= 10*. The neighbors of node V are connected with 7 edges between each other, E = *{(V₁, V₂), (V₂, V₃), (V₃, V₄), (V₄, V₅), (V₅, V₁), (V₁, V₃), (V₁, V₄)}*. The clustering coefficient of node V is C = *E_V/E_max= 7/10 = 0.7*.

**Figure 6**
**Network Motifs**. Some common network motifs. A) *Feed-forward loop*. Type of networks: protein, neuron, electronic. B) *Three chain*. Type of network: food webs. C) *Four node feedback*. Type of network: gene regulatory, electronic. D) *Three node feedback*. Type of network: gene regulatory, electronic. E) *Bi-parallel*. Type of network: gene regulatory, biochemical. F) *Bi-Fan*. Type of networks: protein, neuron, electronic [74].

**Figure 7**
**Closeness and Betweeness centralities**. **Closeness centrality. V₁**: d₁= 4 × 1 + 1 × 2 + 1 × 3 = 9, *C_clo(1)* = 6/9. V₁accesses 4 nodes *(V₂, V₅, V₆, V₇)* with step 1, 1 node *(V₃)* with step 2 and 1 node *(V₄)* with step 3. 6 nodes can be accessed in total by V₁. V₂: d₂= 2 × 1 + 4 × 2 = 10 > d₁, *C_clo(2)* = 6/10. V₂accesses 2 nodes *(V₁, V₃)* with step 1 and 4 nodes *(V₄, V₅, V₆, V₇)* with step 2. 6 nodes can also be accessed in total by V₂. As a result, V₁is more central than node V₂since d1>d₂. Betweenness centrality. *N_p(1)* = 12 shortest paths that pass through node V₁. The paths from the starting to the ending node are *{V₂-V₅, V₂-V₆, V₂-V₇, V₃-V₅, V₃-V₆, V₃-V₇, V₄-V₅, V₄-V₆, V₄-V₇, V₅-V₆, V₅-V₇, V₆-V₇}. N_p(2)* = 8 shortest paths that pass through node V₂. The paths are {*V₁-V₃, V₁-V₄, V₃-V₅, V₃-V₆, V₃-V₇, V₄-V₅, V₄-V₆, V₄-V₇}. N_p(3) = 5 {V₁-V₄, V₂-V₄, V₄-V₅, V₄-V₆, V₄-V₇}. N_p(4)* = *N_p(5)* = *N_p(6)* = *N_p(7) = 0. N_p*= 25 the total sum of shortest paths that pass through the nodes, thus *N_p= N_p(1)+N_p(2)+N_p(3)+N_p(4)+N_p(5)+N_p(6)+N_p(7)*. The centralities are *C_b(1) = 12/25 = 0.48, C_b(2) = 8/25 = 0.32, C_b(3) = 5/25 = 0.20, C_b(4) = C_b(5) = C_b(6) = C_b(7) = 0*, thus node V₁is more central.

**Figure 8**
**Eccentricity Centrality**. V₁: 4 × 1, 2 × 2; V₁accesses 4 nodes *(V₂, V₃, V₅, V₆)* with step 1 and 2 nodes *(V₄, V₇)* with step 2. The step represents the shortest path. The maximum shortest path *d_max*= 2. V₂: 3 × 1, 3 × 2; Similarly V₂accesses 3 nodes *(V₄, V₇, V₁)* with step 1 and 3 nodes *(V₃, V₅, V₆)* with step 2. The maximum shortest path *d_max= 2*. V₃: 2 × 1, 3 × 2, 1 × 3; Similarly V₃accesses 2 nodes *(V₁, V₄)* with step 1, 3 nodes *(V₂, V₅, V₆)* and one node *(V₇)* with step 3. The maximum shortest path *d_max= 3*. V₄: 2 × 1, 2 × 2, 2 × 3; The maximum shortest path d_max=3. V₅: 1 × 1, 3 × 2, 2 × 3; The maximum shortest path d_max= 3. V₆: 1 × 1, 3 × 2, 2 × 3; The maximum shortest path d_max= 3. V₇: 1 × 1, 2 × 2, 3 × 3; The maximum shortest path d_max= 3. As a result, the ordering of the nodes according to C_ecc: (V₁,V₂), (V₃,V₄,V₅,V₆,V₇).

**Figure 9**
**Matching Index**. V₁is connected with 5 nodes *(V₃, V₄, V₆, V_7,V₈)*. V₂is connected with 4 nodes *(V₃, V₄, V₅, V₈)*. V₃is connected with 2 nodes *(V₁, V₂)*. V₄is connected with 3 nodes *(V₁, V₂)*. V₅is connected with 1 node *(V₂)*. V₆is connected with 1 node *(V₁)*. V₇is connected with 1 node *(V₁)*. V₈is connected with 2 nodes *(V₁, V₅)*. Node V₁and V₂are connected with 3 common nodes *(V₃, V₄, V₈)*and in total with 6 distinct neighbors *(V₃, V₄, V₈, V₅, V₆, V₇)*. The matching index will then be M_1,2= 3/6 = 0.5, thus V₁and V₂are functionally similar even though they are not connected.

**Figure 10**
**Average linkage hierarchical clustering example**. The expression of 44 genes was measured in 4 experiments (E₁, E₂, E₃, E₄). The genes were classified according to their coexpression levels. The Pearson Correlation Coefficient was used (r-value) to analyze gene set signal values. Genes were clustered according to the r-value correlation matrix using the Average Linkage Hierarchical clustering method. The tree on the left clusters the expressions of the genes whereas the tree on top of the figure clusters the profiles of the experiments. Thus experiments E₂and E₃are similar and closely related.

**Figure 11**
**Predicting protein complexes from PPI networks**. Protein complexes predicted after applying Spectral clustering algorithm and filtering the results in a yeast protein-protein dataset [12] using the jClust application [146]. The budding yeast Arp2/3 complex that is highlighted was successfully predicted.

See this image and copyright information in PMC

References

1. Pellegrini Matteo, Haynor David, Johnson JM. Protein interaction networks. Expert Rev Proteomics. 2004;1(2) - PubMed
1. Vikis HG, Guan KL. Glutathione-S-transferase-fusion based assays for studying protein-protein interactions. Methods Mol Biol. 2004;261:175–186. - PubMed
1. Puig O, Caspary F, Rigaut G, Rutz B, Bouveret E, Bragado-Nilsson E, Wilm M, Seraphin B. The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods. 2001;24(3):218–229. - PubMed
1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98(8):4569–4574. - PMC - PubMed
1. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–147. - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using graph theory to analyze biological networks

Affiliation

Using graph theory to analyze biological networks

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources