Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov 16:10:376.
doi: 10.1186/1471-2105-10-376.

Algorithms for effective querying of compound graph-based pathway databases

Affiliations

Algorithms for effective querying of compound graph-based pathway databases

Ugur Dogrusoz et al. BMC Bioinformatics. .

Abstract

Background: Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools.

Results: Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool PATIKAweb (Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases.

Conclusion: The PATIKA Project Web site is http://www.patika.org. PATIKAweb version 2.1 is available at http://web.patika.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
How pathways are integrated. Conceptual illustration of how pathways are integrated in a knowledge base (each pathway is colored distinctly), which is typically on disk, and how a sub-network of interest (parts of three different original pathways) may be extracted and displayed as a result of a query.
Figure 2
Figure 2
Example compound graph. An example compound graph (V (G) = {n1, n2, n3, n4, n5, c1}, E(G) = {n2c1, n3n2, n3n5, n4n1}, and I = {c1n4, c1n5}). The order, in which nodes are traversed, depends on the relationship between a compound structure and its members and on the relationship between members of a compound structure.
Figure 3
Figure 3
Sample pathways represented by PATIKA ontology. (left) Canonical wnt pathway containing examples of compound structures, such as regular abstractions (e.g., "protein degradation"), homology abstractions (e.g., 5 wnt genes), and molecular complexes (e.g., APC:Axin) [4]. (right) Partial human interaction networks containing PPIs and the transcriptional regulation interactions of proteins CRK and SP1, respectively.
Figure 4
Figure 4
Sample query. Sample query tree to find the union of 1-neighborhood of the objects on the shortest path from states whose name starts with "Fas" to states whose name starts with "RB" with the shortest path from states whose name starts with "Fas" to states whose name starts with "JNK1" [18].
Figure 5
Figure 5
Sample query result. Mechanistic view of the result of the following sample query: paths-of-interest (yellow) with source of all mechanistic nodes whose names contain "caspase-8" (green) and target for those whose names contain "bax" (cyan) with limit 8; Highlight Legend Dialog for this query is shown on the right.
Figure 6
Figure 6
Query parameters vs. execution time. (top) Result set size (|NB(S, k)|) vs. execution time for GoI algorithm. (middle) Distance (k) vs. execution time for GoI algorithm. (bottom) Shortest length vs. execution time of shortest-path algorithm.
Figure 7
Figure 7
Example traversals of complexes and ubiquitous molecules. (left) The traversal reaching complex "c1" from the left transition will continue to the right transition only if the "Link Members of Complex" option is true. (right) Whether or not protein states "a" and "b" are in the 4-neighborhood (yellow) of state "c" (green) depends on whether traversal over ubiquitous molecules ("ubique X") is allowed. In this case, it was allowed.
Figure 8
Figure 8
Neighborhood. 2-neighborhood (yellow) of phosphorylated Emi1 (Ser 182) (green) in a partial pathway in nucleus. Compound nodes with dashed borders represent homologies, whereas compound nodes with solid borders represent molecular complexes.
Figure 9
Figure 9
Upstream. (left) Up (green) and down (cyan) stream of protein "a" (yellow) in a partial mechanistic pathway. (right) Unambiguous positive upstream of node "a" (yellow) contains node "c" (green) only, as node "b" is on both positive and negative paths leading to node "a".
Figure 10
Figure 10
Common regulator. The common upstream, with a path limit of 2, of small molecules containing the word "lauro" in their name (cyan) in this partial mechanistic pathway turns out to be a single node representing a molecular complex (green). The paths from the potential common regulator to the target nodes are highlighted (yellow).
Figure 11
Figure 11
Graph of interest. (left) A PPI network with proteins of interest CRK and CRKL (green); (middle) Graph of interest (yellow) formed by using paths of length up to 3 (k = 3) between nodes of interest (green); (right) Graph of interest with k = 2 (k = 1 returns no results).
Figure 12
Figure 12
Shortest paths. Shortest paths (yellow) between bioentities whose names start with "PPA" (cyan) and those whose names contain "ESR" (green) with (left) d = 0 and (right) d = 2. Notice that the length of a shortest path between these two node sets is 1.
Figure 13
Figure 13
Shortest paths. Shortest path (yellow) between Cdc25C (green) and CKI (cyan) in nucleus. Compound nodes with dashed borders represent homologies, whereas compound nodes with solid borders represent molecular complexes.
Figure 14
Figure 14
Positive feedback. Positive feedback (yellow) of a specified Citrate state in mitochondria (green) with up to length 10. The result contains two metabolic cycles; one in mitochondria (of length 10) and one through cytoplasm (of length 8).

Similar articles

Cited by

References

    1. Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, de Bono B, Garapati P, Hemish J, Hermjakob H, Jassal B, Kanapin A, Lewis S, Mahajan S, May B, Schmidt E, Vastrik I, Wu G, Birney E, Stein L, D'Eustachio P. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009. pp. D619–22. - DOI - PMC - PubMed
    1. Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, Walk TC, Zhang P, Karp PD. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2008. pp. D623–31. - PMC - PubMed
    1. Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork P, Goto S, Kanehisa M. KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res. 2008. pp. W423–6. - DOI - PMC - PubMed
    1. Demir E, Babur O, Dogrusoz U, Gursoy A, Ayaz A, Gulesir G, Nisanci G, Cetin-Atalay R. An Ontology for Collaborative Construction and Analysis of Cellular Pathways. Bioinformatics. 2004;20(3):349–356. doi: 10.1093/bioinformatics/btg416. - DOI - PubMed
    1. Bondy JA, Murty USR. Graph Theory with Applications. Great Britain: The McMillan Press Ltd; 1976.

Publication types

LinkOut - more resources