GePI: large-scale text mining, customized retrieval and flexible filtering of gene/protein interactions
- PMID: 37224532
- PMCID: PMC10320155
- DOI: 10.1093/nar/gkad445
GePI: large-scale text mining, customized retrieval and flexible filtering of gene/protein interactions
Abstract
We present GePI, a novel Web server for large-scale text mining of molecular interactions from the scientific biomedical literature. GePI leverages natural language processing techniques to identify genes and related entities, interactions between those entities and biomolecular events involving them. GePI supports rapid retrieval of interactions based on powerful search options to contextualize queries targeting (lists of) genes of interest. Contextualization is enabled by full-text filters constraining the search for interactions to either sentences or paragraphs, with or without pre-defined gene lists. Our knowledge graph is updated several times a week ensuring the most recent information to be available at all times. The result page provides an overview of the outcome of a search, with accompanying interaction statistics and visualizations. A table (downloadable in Excel format) gives direct access to the retrieved interaction pairs, together with information about the molecular entities, the factual certainty of the interactions (as verbatim expressed by the authors), and a text snippet from the original document that verbalizes each interaction. In summary, our Web application offers free, easy-to-use, and up-to-date monitoring of gene and protein interaction information, in company with flexible query formulation and filtering options. GePI is available at https://gepi.coling.uni-jena.de/.
© The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures
References
-
- Szklarczyk D., Kirsch R., Koutrouli M., Nastou K.C., Mehryary F., Hachilif R., Gable A.L., Fang T., Doncheva N.T., Pyysalo S.et al. .. The String database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023; 51:D638–D646. - PMC - PubMed
-
- Szklarczyk D., Gable A.L., Nastou K.C., Lyon D., Kirsch R., Pyysalo S., Doncheva N.T., Legeay M., Fang T., Bork P.et al. .. The String database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021; 49:D605–D612. - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
