Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;15(6):973-83.
doi: 10.1093/bib/bbt058. Epub 2013 Aug 14.

Extracting reaction networks from databases-opening Pandora's box

Extracting reaction networks from databases-opening Pandora's box

Liam G Fearnley et al. Brief Bioinform. 2014 Nov.

Abstract

Large quantities of information describing the mechanisms of biological pathways continue to be collected in publicly available databases. At the same time, experiments have increased in scale, and biologists increasingly use pathways defined in online databases to interpret the results of experiments and generate hypotheses. Emerging computational techniques that exploit the rich biological information captured in reaction systems require formal standardized descriptions of pathways to extract these reaction networks and avoid the alternative: time-consuming and largely manual literature-based network reconstruction. Here, we systematically evaluate the effects of commonly used knowledge representations on the seemingly simple task of extracting a reaction network describing signal transduction from a pathway database. We show that this process is in fact surprisingly difficult, and the pathway representations adopted by various knowledge bases have dramatic consequences for reaction network extraction, connectivity, capture of pathway crosstalk and in the modelling of cell-cell interactions. Researchers constructing computational models built from automatically extracted reaction networks must therefore consider the issues we outline in this review to maximize the value of existing pathway knowledge.

Keywords: databases; modelling; reaction networks; signal transduction.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Duplication of entities decreases network connectivity. It is essential that each entity in a given cellular location is represented with a single entry in the underlying database. In this visualization, each node in the network refers to a unique database entry. In (A), the entity represented by a star has been duplicated (solid and dashed outlines). This significantly reduces the connectivity and complexity of the network described by the data. (B) shows a network consisting of multiple signal transduction pathways implicated in prostate cancer visualized from data originally sourced from the PANTHER Pathways database and analysed in [17]. This network has duplication of 28 entities. Correcting these duplications, as illustrated in (C) yields the network shown in (D), with an attendant increase in connectivity and complexity.
Figure 2:
Figure 2:
Bucketing of entities has a significant effect on networks. In (A), three entities have been grouped into a meta-entity (dashed circle), which interacts with the species described by the star. One of the entities has a number of distinct separate activities outside of this group. The network depicted in (B) is sourced from Reactome's ‘Mitotic G1-G1/S phases' pathway (REACT_21267.3). The BioPAX Level 3 representation of this pathway contains 27 of these meta-entities. Removing the meta-entity, as illustrated in (C) results in significant changes to the network shape. Restoration of connectivity lost owing to meta-entity use generates the network shown in (D), significantly changing network topology.
Figure 3:
Figure 3:
Multicellular interactions present problems in the absence of a defined cellular frame of reference. (A) shows an example system with cellular locations defined solely with respect to the cytosol, cell membrane and extracellular region of an unspecified cell. This representation generates ambiguity and is misleading when describing multicellular interactions—the same set of reactions can lead to significantly different functional capabilities of the interaction network when this is accurately represented (C). The example in (B) is sourced from Reactome’s ‘Latent infection of H. sapiens with M. tuberculosis’ pathway (REACT_121237.2). In the version of the network described in the database, the ‘cell wall’, ‘periplasmic space’, and ‘plasma membrane’ locations can be assigned to Mycobacterium (green) and ‘phagocytic vesicle membrane’ and ‘late endosome membrane’ to H. sapiens (blue). The more generic ‘cytosol’ is ambiguous (orange), and reactions assigned to this location could belong to either species. Fixing these assignments (using the graphical representation of the pathway) yields the unambiguous representation shown in (D). A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.

Similar articles

Cited by

References

    1. Karr JR, Sanghvi JC, Macklin DN, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. - PMC - PubMed
    1. Fraser CM, Gocayne JD, White O, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995;270(5235):397–403. - PubMed
    1. Thiele I, Swainston N, Fleming RM, et al. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 2013;31:419–25. - PMC - PubMed
    1. Li C, Donizelli M, Rodriguez N, et al. BioModels Database: an enhanced, curated and annotated resource for published quantitative kinetic models. BMC Syst Biol. 2010;4:92. - PMC - PubMed
    1. Matthews L, Gopinath G, Gillespie M, et al. Reactome knowledge base of human biological pathways and processes. Nucleic Acids Res. 2009;37:D619–22. - PMC - PubMed