A platform for the biomedical application of large language models

Sebastian Lobentanzer^{1

2}, Shaohong Feng³, Noah Bruderer⁴, Andreas Maier⁵; BioChatter Consortium; Cankun Wang³, Jan Baumbach^{5

6}, Jorge Abreu-Vicente⁷, Nils Krehl⁸, Qin Ma³, Thomas Lemberger⁷, Julio Saez-Rodriguez^{9

10}

Collaborators, Affiliations

Affiliations

¹ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany. sebastian.lobentanzer@gmail.com.
² Open Targets, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. sebastian.lobentanzer@gmail.com.
³ Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
⁴ Michael Sars Centre, University of Bergen, Bergen, Norway.
⁵ Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁶ Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
⁷ EMBO, Heidelberg, Germany.
⁸ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
⁹ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany. saezlab@ebi.ac.uk.
¹⁰ European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. saezlab@ebi.ac.uk.

PMID: 39843580
PMCID: PMC12216031
DOI: 10.1038/s41587-024-02534-3

A platform for the biomedical application of large language models

Sebastian Lobentanzer et al. Nat Biotechnol. 2025 Feb.

. 2025 Feb;43(2):166-169.

doi: 10.1038/s41587-024-02534-3.

Authors

Affiliations

¹ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany. sebastian.lobentanzer@gmail.com.
² Open Targets, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. sebastian.lobentanzer@gmail.com.
³ Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
⁴ Michael Sars Centre, University of Bergen, Bergen, Norway.
⁵ Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.
⁶ Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
⁷ EMBO, Heidelberg, Germany.
⁸ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
⁹ Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany. saezlab@ebi.ac.uk.
¹⁰ European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. saezlab@ebi.ac.uk.

PMID: 39843580
PMCID: PMC12216031
DOI: 10.1038/s41587-024-02534-3

No abstract available

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.S.-R. reports funding from GSK, Pfizer and Sanofi and fees or honoraria from Travere Therapeutics, Stadapharm, Pfizer, Grunenthal, Owkin, Moderna and Astex Pharmaceuticals.

Figures

**Fig. 1 |. The modular BioChatter platform architecture.**
a, BioChatter provides a selection of diverse APIs for various use cases (Python, REST) and two graphical user interfaces (the Python-based “Light” for rapid prototyping and the more full-featured JavaScript app “Next”). b, BioChatter facilitates the creation of custom deployments on a spectrum of tradeoff between simplicity/economy (left) and security (right). c, BioChatter harmonizes the APIs of open-source LLM deployment tools and proprietary LLM providers (brown), knowledge management systems such as knowledge graphs and vector databases (purple), public APIs (red) of databases (such as OncoKB), and software (such as BLAST). In addition, the LLM can be specifically instructed according to the user’s context via customizable system prompts (green). Each use case is then an individualized combination of these components, combined by either manual or semiautomated agentic workflows, and adapted to the user’s needs, including use-case-specific validation for robustness. API, application programming interface; KG, knowledge graph; LLM, large language model; REST, representational state transfer.

**Fig. 2 |. Benchmarking, monitoring, and outlook.**
a, The workflow of introducing use case-specific tests into the BioChatter benchmarking framework facilitates continuous monitoring. Dedicated benchmarks are run across a combination of models and other parameters. b, Comparison of two benchmark tasks for knowledge graph query generation show that BioChatter’s prompt engine achieves much higher accuracy than the naive approach (measured as number of correct query components among all tested). The BioChatter variant involves a multistep procedure of constructing the query, while the “LLM only” variant receives the complete schema definition of a BioCypher knowledge graph (which BioChatter also uses as a basis for the prompt engine). The general instructions for both variants are otherwise the same (Supplementary Note: Benchmarking). The test includes all models, sizes and quantization levels (N = 150), and the performance is measured as the mean accuracy for the two tasks (0.486 ± 0.12 vs 0.844 ± 0.11, unpaired t-test P < 0.001, t = 18.65). Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. c, The BioCypher ecosystem will cover the process of knowledge management from extraction through representation to application. We will develop BioGather (denoted by dashed lines as ongoing work) to integrate natively with the BioCypher and BioChatter frameworks to allow flexible extraction of information from diverse resources using a unified API. This will achieve two pairs of bidirectional synergies: knowledge graphs to extraction pipelines and knowledge graphs to LLMs, respectively.

See this image and copyright information in PMC

References

1. Perez-Lopez R, Ghaffari Laleh N, Mahmood F & Kather JN Nat. Rev. Cancer 24, 427–441 (2024). - PubMed
1. Simon E, Swanson K & Zou J Nat. Methods 21, 1422–1429 (2024). - PubMed
1. Liesenfeld A & Dingemanse M Rethinking open source generative AI: open-washing and the EU AI Act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, 10.1145/3630106.3659005 (ACM, 2024). - DOI
1. Pividori M Nature 10.1038/d41586-024-02630-z (2024). - DOI - PubMed
1. UNESCO. UNESCO Recommendation on Open Science. UNESCO; 10.54677/mnmh8546 (2021). - DOI

Publication types

Actions

Grants and funding

U54 AG075931/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A platform for the biomedical application of large language models

Collaborators

Affiliations

A platform for the biomedical application of large language models

Authors

Collaborators

Affiliations

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources