Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Nov 4:23:3999-4010.
doi: 10.1016/j.csbj.2024.10.045. eCollection 2024 Dec.

BioPAX in 2024: Where we are and where we are heading

Affiliations
Review

BioPAX in 2024: Where we are and where we are heading

Cécile Beust et al. Comput Struct Biotechnol J. .

Abstract

In systems biology, the study of biological pathways plays a central role in understanding the complexity of biological systems. The massification of pathway data made available by numerous online databases in recent years has given rise to an important need for standardization of this data. The BioPAX format (Biological Pathway Exchange) emerged in 2010 as a solution for standardizing and exchanging pathway data across databases. BioPAX is a Semantic Web format associated to an ontology. It is highly expressive, allowing to finely describe biological pathways at the molecular and cellular levels, but the associated intrinsic complexity may be an obstacle to its widespread adoption. Here, we report on the use of the BioPAX format in 2024. We compare how the different pathway databases use BioPAX to standardize their data and point out possible avenues for improvement to make full use of its potential. We also report on the various tools and software that have been developed to work with BioPAX data. Finally, we present a new concept of abstraction on BioPAX graphs that would allow to specifically target areas in a BioPAX graph needed for a specific analysis, thus differentiating the format suited for representation and the abstraction suited for contextual analysis.

Keywords: BioPAX; Biological pathways; Databases; Systems Biology.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to declare.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Simplified BioPAX data schema adapted from Demir et al.. The hierarchy of the main classes of the BioPAX ontology is represented as color boxes. The main properties associated to each class are detailed at the bottom of the boxes. The types of resources properties point to are indicated next to each of them. The properties originating from a class are either class-specific or inherited from the ancestor classes. The Entity class (in gray) is the root of the BioPAX ontology. PhysicalEntity (in blue) describes the biological entities involved in pathways such as proteins, complexes, small chemical molecules or even RNA or DNA. Interaction (in green) encompasses several types of biological interactions including reactions as well as regulation processes that typically involve one or several physical entities. Pathway (in red) is used to characterize biological pathways, including the reactions and pathway steps (the sequence of reactions) composing them.
Fig. 2
Fig. 2
Example of a pathway represented in BioPAX. (Panel A) The pathway represented is the ‘Formation of the RNA Poll II elongation complex’ pathway (R-HSA-112382) from the Reactome database (version 90, September 2024). It is described using the classes and properties of the BioPAX ontology. Each arrow represents a relation (also called a predicate) from a subject to an object, the whole making up a RDF triple (Panel B, see Supplementary File S-2 for the complete list of RDF triples). Resources are represented with their identifiers in the BioPAX export of Reactome (version 87). Instances are represented by ellipses and classes by boxes (node color legend from Fig. 1). The instantiation relationships describing the membership of resources to a class are represented by red arrows. Instantiation relationships were omitted for the PhysicalEntity instances. The pathway of interest (in red) is composed of two direct pathway steps (in purple), linked by a bp3:nextStep relationship indicating the sequence of the steps. The first pathway step (PathwayStep10617) is composed of the BiochemicalReaction8726 (‘Hyperphosphorylation (Ser2) of RNA Pol II CTD by P-TEFb complex’) which consumes the Complex5463 (via the bp3:left relation) and produces the Complex10863 (via the bp3:right relation). BiochemicalReaction8726 is controlled (activated) by the Catalysis3256 reaction. The second pathway step (PathwayStep10618) is composed of the BiochemicalReaction8727 (‘Recruitment of elongation factors to form elongation complex’) which consumes the product of the previous pathway step and produces the Complex10864.
Fig. 3
Fig. 3
Comparison of the contents of BioPAX exports of the main biological pathway databases. The BioPAX exports of five pathway databases were extracted from the PathwayCommons database (Reactome, PANTHER Pathway, PathBank, HumanCyc, and KEGG Pathway) and the standalone BioPAX export of Reactome (version 90, September 2024) and PANTHER Pathway have been added. For each database, the number of instances of each of the following BioPAX class is represented on the radar plots: BiochemicalReaction, PathwayStep, Pathway, Interaction, SmallMolecule, Protein, Dna, Rna. The values are calculated as percentages of the maximum value of each class. The same comparison was extended to the nine pathway databases available on PathwayCommons is available in Supplementary Figure S-4. For more specific information on each pathway database, a detailed table is available as Supplementary Table S-1.
Fig. 4
Fig. 4
Mappings of BioPAX instances to UniProtKB in the BioPAX exports of the main pathway databases (Reactome standalone BioPAX export and from PathwayCommons, PANTHER Pathway standalone BioPAX export and from PathwayCommons (filtered for human-only uniprot protein mappings), HumanCyc from PathwayCommons, PathBank from PathwayCommons and KEGG Pathway from PathwayCommons). The top panel represents the mapping of BioPAX proteins to UniProtKB. Each instance of Protein (P) points to a ProteinReference (PR) that can be linked to a UniProtKB identifier. On the bottom panel, the number of mappings from the BioPAX instances of P to UniProtKB is detailed as well as the number of the unique UniProtKB identifiers that the database points to. The number of instances of P that lack association with PR is also reported, as well as the number of P whose PR do not point to a UniProtKB identifier. Comparative analysis for the nine pathway databases available on PathwayCommons is provided as Supplementary Figure S-6.
Fig. 5
Fig. 5
Mappings of BioPAX instances to ChEBI in the BioPAX exports of the main pathway databases (Reactome standalone BioPAX export and from PathwayCommons, PANTHER Pathway standalone BioPAX export and from PathwayCommons, HumanCyc from PathwayCommons, PathBank from PathwayCommons and KEGG Pathway from PathwayCommons). The top panel represents the mapping of BioPAX small molecules to ChEBI. Each instance of SmallMolecule (SM) points to SmallMoleculeReference (SMR) that can be linked to a ChEBI identifier. On the bottom panel, the number of mappings from the BioPAX instances of SM to ChEBI is detailed as well as the number of the unique ChEBI identifiers that the database points to. The number of instances of SM that lack association with SMR is also reported, as well as the number of SM whose SMR do not point to a ChEBI identifier. Comparative analysis for the nine pathway databases available on PathwayCommons is provided as Supplementary Figure S-7.
Fig. 6
Fig. 6
Simplified BioPAX representation of the pathway ‘Signaling by EGFR’ (R-HSA-177929) from Reactome (BioPAX export (version 90, September 2024)). The instances of some BioPAX classes are represented as color nodes (Pathway, PathwayStep, BiochemicalReaction, Complex, Protein, SmallMolecule, Stoichiometry, Catalysis, Control). The BioPAX properties between two resources are detailed on the edges. The root pathway ‘Signaling by EGFR’ and its five associated subpathways are framed with dashed boxes. Each of these pathways may be composed of different pathway steps (visualized as purple nodes), that are linked to processes that can be biochemical reactions (green nodes) or catalysis events that control the reactions (light green nodes). Red edges represent the bp3:nextStep edges between pathway steps. Each biochemical reaction involves entities as reactants or products, that can be complexes (dark blue nodes), proteins (light blue nodes) or small molecules (orange nodes). Biochemical reactions are also characterized by a stoichiometry, represented by pink nodes. The following BioPAX properties are not represented: bp3:dataSource, bp3:comment, bp3:xref, controlType, bp3:conversionDirection, bp3:evidence, bp3:organism, bp3:availability, bp3:entityReference, bp3:cellularLocation. Visualization using Cytoscape (Shannon et al. [34]).
Fig. 7
Fig. 7
Example of abstraction of the ‘Signaling by EGFR’ pathway (R-HSA-177929) from Reactome (version 90, September 2024) of Fig. 6. The pathway was abstracted in order to show the sequence of biochemical reactions (green nodes) of the pathway and their interconnections to other sub-pathways (red nodes). On the original graph of Fig. 6, if two instances of PathwayStep are linked by a bp3:nextStep property, their associated biochemical reactions (green nodes) are linked by a new abs:NextStepBiochemicalReaction property (plain green edge) on the abstracted graph. If any biochemical reaction of the pathway is followed by a biochemical reaction belonging to another sub-pathway, an abs:NextStepPathway property (dashed gray edge) is added between the biochemical reaction node (green nodes) and the sub-pathway node (red nodes). If any biochemical reaction of a sub-pathway is followed by a biochemical reaction of another sub-pathway, an abs:NextStepPathway property (dashed red edge) is added between the sub-pathways. Ultimately, we removed the pathway steps (purple nodes). This abstraction reveals the high-level sequences between the direct components of a pathway of interest.

Similar articles

  • BioPAX support in CellDesigner.
    Mi H, Muruganujan A, Demir E, Matsuoka Y, Funahashi A, Kitano H, Thomas PD. Mi H, et al. Bioinformatics. 2011 Dec 15;27(24):3437-8. doi: 10.1093/bioinformatics/btr586. Epub 2011 Oct 21. Bioinformatics. 2011. PMID: 22021903 Free PMC article.
  • BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways.
    Agapito G, Pastrello C, Guzzi PH, Jurisica I, Cannataro M. Agapito G, et al. Bioinformatics. 2020 Aug 1;36(15):4377-4378. doi: 10.1093/bioinformatics/btaa529. Bioinformatics. 2020. PMID: 32437515
  • Kinetic Modeling using BioPAX ontology.
    Ruebenacker O, Moraru II, Schaff JC, Blinov ML. Ruebenacker O, et al. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2007 Nov 2;2007:339-348. doi: 10.1109/BIBM.2007.55. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2007. PMID: 20862270 Free PMC article.
  • e-Science and biological pathway semantics.
    Luciano JS, Stevens RD. Luciano JS, et al. BMC Bioinformatics. 2007 May 9;8 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-8-S3-S3. BMC Bioinformatics. 2007. PMID: 17493286 Free PMC article. Review.
  • PAX of mind for pathway researchers.
    Luciano JS. Luciano JS. Drug Discov Today. 2005 Jul 1;10(13):937-42. doi: 10.1016/S1359-6446(05)03501-4. Drug Discov Today. 2005. PMID: 15993813 Review.

References

    1. Jassal B., Matthews L., Viteri G., Gong C., Lorente P., Fabregat A., et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48:D498–D503. doi: 10.1093/nar/gkz1031. - DOI - PMC - PubMed
    1. Milacic M., Beavers D., Conley P., Gong C., Gillespie M., Griss J., et al. The reactome pathway knowledgebase 2024. Nucleic Acids Res. 2024;52:D672–D678. doi: 10.1093/nar/gkad1025. - DOI - PMC - PubMed
    1. Pundir S., Martin M.J., O'Donovan C. UniProt protein knowledgebase. Methods Mol Biol. 2017;1558:41–55. doi: 10.1007/978-1-4939-6783-4_2. (Clifton, N.J.) - DOI - PMC - PubMed
    1. The UniProt Consortium UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51:D523–D531. doi: 10.1093/nar/gkac1052. - DOI - PMC - PubMed
    1. Hastings J., Owen G., Dekker A., Ennis M., Kale N., Muthukrishnan V., et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44 doi: 10.1093/nar/gkv1031. - DOI - PMC - PubMed

LinkOut - more resources