Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 29;41(4):btaf160.
doi: 10.1093/bioinformatics/btaf160.

GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes

Affiliations

GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes

Chang-Uk Jeong et al. Bioinformatics. .

Abstract

Motivation: Leveraging deep learning for the representation learning of Gene Ontology (GO) and Gene Ontology Annotation (GOA) holds significant promise for enhancing downstream biological tasks such as protein-protein interaction prediction. Prior approaches have predominantly used text- and graph-based methods, embedding GO and GOA in a single geometric space (e.g. Euclidean or hyperbolic). However, since the GO graph exhibits a complex and nonmonotonic hierarchy, single-space embeddings are insufficient to fully capture its structural nuances.

Results: In this study, we address this limitation by exploiting geometric interaction to better reflect the intricate hierarchical structure of GO. Our proposed method, Geometry-Aware Knowledge Graph Embeddings for GO and Genes (GeOKG), leverages interactions among various geometric representations during training, thereby modeling the complex hierarchy of GO more effectively. Experiments at the GO level demonstrate the benefits of incorporating these geometric interactions, while gene-level tests reveal that GeOKG outperforms existing methods in protein-protein interaction prediction. These findings highlight the potential of using geometric interaction for embedding heterogeneous biomedical networks.

Availability and implementation: https://github.com/ukjung21/GeOKG.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Forman-Ricci curvature of Gene Ontology graph. Everydot represents the node of the graph, with larger dots indicating more connections. Directed edges appear as arrows. This compact visualization underscores the graph’s structural and geometric nuances, where contains broad range of curvature. A detailed explanation of Forman-Ricci curvature is provided in Supplementary Section S1.
Figure 2.
Figure 2.
Schematic of GeOKG-H. GeOKG-H is the hyperbolic space embedding model for GO. eh and et are the final embeddings of head and tail entities, respectively. The approach uses exponential and logarithmic mappings (Supplementary Section S2.1), exp(·) and log(·), to bridge between Euclidean and hyperbolic spaces. In each interaction space, the rotational transformation Rot(·) is applied and then the results are integrated back into the Euclidean space. The attention mechanism, att(·), aggregates the geometric information from all the interaction spaces. The learning objective in hyperbolic space with learnable curvature c is to optimize the entity and relation embeddings by minimizing the hyperbolic distance dc(Hhc,et).
Figure 3.
Figure 3.
Overview of GeOKG. (a) GeOKGH: Embeds the Gene Ontology (GO) graph in the hyperbolic space of a Poincaré ball (Bd,c) to preserve its hierarchical structure. Its effectiveness is validated through three GO-level tasks. (b) GeOKGE: Targets the Gene Ontology Annotation (GOA) graph—comprising both GO terms and genes—using Euclidean space (Rd). This module uses a two-phase training process: initially pre-training on the GO graph to obtain baseline GO term embeddings, followed by fine-tuning on the GOA graph. Only genes corresponding to proteins in the STRING PPI network are selected, with the resulting protein embeddings evaluated via three protein–protein interaction prediction tasks. Here, NGO and NGenes mean the number of GO terms and genes, respectively.
Figure 4.
Figure 4.
UMAP visualizations of gene embeddings. The UniProt ID of the gene product is labeled for New_genes and Prev_genes. Compared to Random_genes, Prev_genes are significantly closer to New_genes (P-values < 8.0e−30, Mann–Whitney test). We set “init” parameter as “random.”

Similar articles

References

    1. Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024;40:btae301. 10.1093/bioinformatics/btae301 - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA et al. Gene ontology: tool for the unification of biology. Nat Genet 2000;25:25–9. - PMC - PubMed
    1. Balazevic I, Allen C, Hospedales T. Multi-relational poincaré graph embeddings. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 4463–73.
    1. Bordes A, Usunier N, Garcia-Duran A et al. Translating embeddings for modeling multi-relational data. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, 2013, 2787–95.
    1. Cao Z, Xu Q, Yang Z et al. Geometry interaction knowledge graph embeddings. AAAI 2022;36:5521–9.