Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics

Anqi Lin¹, Junpu Ye¹, Chang Qi², Lingxuan Zhu³, Weiming Mou^{3

4}, Wenyi Gan⁵, Dongqiang Zeng^{6

7}, Bufu Tang⁸, Mingjia Xiao⁹, Guangdi Chu¹⁰, Shengkun Peng¹¹, Hank Z H Wong¹², Lin Zhang^{13

14}, Hengguo Zhang¹⁵, Xinpei Deng¹⁶, Kailai Li³, Jian Zhang³, Aimin Jiang¹⁷, Zhengrui Li¹⁸, Peng Luo^{1

19}

Affiliations

¹ Donghai County People's Hospital (Affiliated Kangda College of Nanjing Medical University); Department of Oncology, Zhujiang Hospital, Southern Medical University, Lianyungang 222000, China.
² Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria.
³ Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China.
⁴ Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
⁵ Department of Joint Surgery and Sports Medicine, Zhuhai People's Hospital (Zhuhai hospital affiliated with Jinan University), Guangdong, China.
⁶ Department of Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.
⁷ Cancer Center, The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan, 528000, China.
⁸ Department of Radiation Oncology, Zhongshan Hospital Affiliated to Fudan University, Shanghai, China.
⁹ Hepatobiliary Surgery Department, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, China.
¹⁰ Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China.
¹¹ Department of Radiology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China.
¹² Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
¹³ The School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3000, Australia.
¹⁴ Suzhou Industrial Park Monash Research Institute of Science and Technology, Suzhou, Jiangsu 215000, China.
¹⁵ College & Hospital of Stomatology, Anhui Medical University, Key Laboratory of Oral Diseases Research of Anhui Province, Hefei, 230032, China.
¹⁶ Department of Urology, State Key Laboratory of Oncology in Southern China, Sun Yat-sen University Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Guangzhou, 510060, China.
¹⁷ Department of Urology, Changhai Hospital, Naval Medical University (Second Military Medical University), Shanghai, China.
¹⁸ Department of Oral and Cranio-Maxillofacial Surgery, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology, Shanghai 200011, China.
¹⁹ Department of Microbiology, State Key Laboratory of Emerging Infectious Diseases, Carol Yu Centre for Infection, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China.

PMID: 40708223
PMCID: PMC12289552
DOI: 10.1093/bib/bbaf357

Review

Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics

Anqi Lin et al. Brief Bioinform. 2025.

. 2025 Jul 2;26(4):bbaf357.

doi: 10.1093/bib/bbaf357.

Authors

Affiliations

¹ Donghai County People's Hospital (Affiliated Kangda College of Nanjing Medical University); Department of Oncology, Zhujiang Hospital, Southern Medical University, Lianyungang 222000, China.
² Institute of Logic and Computation, Vienna University of Technology, Vienna, Austria.
³ Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China.
⁴ Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
⁵ Department of Joint Surgery and Sports Medicine, Zhuhai People's Hospital (Zhuhai hospital affiliated with Jinan University), Guangdong, China.
⁶ Department of Oncology, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, China.
⁷ Cancer Center, The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan, 528000, China.
⁸ Department of Radiation Oncology, Zhongshan Hospital Affiliated to Fudan University, Shanghai, China.
⁹ Hepatobiliary Surgery Department, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, China.
¹⁰ Department of Urology, The Affiliated Hospital of Qingdao University, Qingdao, China.
¹¹ Department of Radiology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, China.
¹² Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
¹³ The School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3000, Australia.
¹⁴ Suzhou Industrial Park Monash Research Institute of Science and Technology, Suzhou, Jiangsu 215000, China.
¹⁵ College & Hospital of Stomatology, Anhui Medical University, Key Laboratory of Oral Diseases Research of Anhui Province, Hefei, 230032, China.
¹⁶ Department of Urology, State Key Laboratory of Oncology in Southern China, Sun Yat-sen University Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Guangzhou, 510060, China.
¹⁷ Department of Urology, Changhai Hospital, Naval Medical University (Second Military Medical University), Shanghai, China.
¹⁸ Department of Oral and Cranio-Maxillofacial Surgery, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology, Shanghai 200011, China.
¹⁹ Department of Microbiology, State Key Laboratory of Emerging Infectious Diseases, Carol Yu Centre for Infection, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR 999077, China.

PMID: 40708223
PMCID: PMC12289552
DOI: 10.1093/bib/bbaf357

Abstract

Large language models (LLMs), representing a breakthrough advancement in artificial intelligence, have demonstrated substantial application value and development potential in bioinformatics research, particularly showing significant progress in the processing and analysis of complex biological data. This comprehensive review systematically examines the development and applications of LLMs in bioinformatics, with particular emphasis on their advancements in protein and nucleic acid structure prediction, omics analysis, drug design and screening, and biomedical literature mining. This work highlights the distinctive capabilities of LLMs in end-to-end learning and knowledge transfer paradigms. Additionally, this paper thoroughly discusses the major challenges confronting LLMs in current applications, including key issues such as model interpretability and data bias. Furthermore, this review comprehensively explores the potential of LLMs in cross-modal learning and interdisciplinary development. In conclusion, this paper aims to systematically summarize the current research status of LLMs in bioinformatics, objectively evaluate their advantages and limitations, and provide insights and recommendations for future research directions, thereby positioning LLMs as essential tools in bioinformatics research and fostering innovative developments in the biomedical field.

Keywords: LLMs; artificial intelligence; bioinformatics; large language models.

PubMed Disclaimer

Figures

**Figure 2**
Key advantages of large language models in bioinformatics research. The implementation of large language models (LLMs) in bioinformatics demonstrates several distinct advantages, primarily in their capability to process extended sequences and high-dimensional data, capture complex semantic and contextual information, perform cross-modal learning and knowledge transfer, reduce manual feature engineering through end-to-end learning, and leverage massive unlabeled data through self-supervised learning. The processing of extended sequences and high-dimensional data is facilitated through advanced sequence tokenization techniques, integrated dimensionality reduction technologies, autoencoder architectures, and multi-head attention mechanisms. Through self-supervised learning approaches and transformer-based architectures, particularly bidirectional encoder representations from transformers (BERT), LLMs demonstrate superior capability in capturing intricate semantic relationships and contextual information. Moreover, LLMs exhibit remarkable efficacy in integrating and processing multimodal data, including textual, visual, and audio inputs, while achieving efficient cross-corpus transfer. The self-supervised learning paradigm, leveraging vast quantities of unlabeled data, utilizes sophisticated multilayer neural network architectures to automatically process complex biological data. Additionally, end-to-end learning approaches significantly reduce the necessity for manual feature engineering, effectively addressing the limitations of traditional supervised learning’s dependence on manually annotated data, particularly in applications such as protein sequence prediction and nucleotide sequence analysis. This figure was created based on the tools provided by Biorender.com (accessed on 15 May 2025). LLMs, large language models; BERT, bidirectional encoder representations from transformers.

**Figure 3**
Future development directions of large language models in bioinformatics. The future development and applications of large language models (LLMs) in bioinformatics encompass several key areas: multimodal fusion learning, knowledge-guided architectural design, model optimization for lightweight deployment, efficient inference, development of explainable artificial intelligence systems, deep integration with experimental biology, enhancement of ethical and privacy protection mechanisms, and promotion of interdisciplinary collaboration and open science. Specifically, multimodal fusion learning can be advanced through multidimensional deep analysis, systematic integration of multi-omics data, and enhancement of model generalization capabilities. Regarding knowledge-guided architectural design, integration of biomedical ontologies and knowledge graphs into model frameworks is essential, alongside the development of knowledge distillation techniques for constructing adaptive learning systems with automatic knowledge base updating capabilities. Model optimization and efficient inference can be achieved through specialized attention mechanisms, model distillation techniques, and implementation of federated learning strategies. In the development of explainable artificial intelligence (AI) systems, advanced visualization techniques and counterfactual explanation methods warrant investigation, coupled with the development of interactive interpretation systems to enhance model transparency. To facilitate deep integration between LLMs and experimental biology, intelligent experimental design systems should be developed, incorporating experimental feedback mechanisms and comprehensive evaluation frameworks that bridge computational predictions with experimental outcomes. For strengthening ethical and privacy protection mechanisms, advanced technologies including federated learning, bias mitigation, and differential privacy should be explored, while establishing robust ethical review systems and standardized regulatory frameworks at the institutional level. In the context of interdisciplinary collaboration and open science, development of cross-disciplinary research tools should be prioritized, along with the establishment of open-access biological databases, standardized evaluation benchmarks, and comprehensive open-source platforms. This figure was created based on the tools provided by Biorender.com (accessed on 15 May 2025).

See this image and copyright information in PMC

References

1. Chaussabel D. Biomedical literature mining: challenges and solutions in the ‘omics’ era. Am J Pharmacogenomics 2004;4:383–93. 10.2165/00129785-200404060-00005. - DOI - PubMed
1. Zhang J, Li H, Tao W. et al. GseaVis: an R package for enhanced visualization of gene set enrichment analysis in biomedicine. Med Research 1:131–5. 10.1002/mdr2.70000. - DOI
1. Chen J, Lin A, Jiang A. et al. Computational frameworks transform antagonism to synergy in optimizing combination therapies. NPJ Digit Med 2025;8:44. 10.1038/s41746-025-01435-2. - DOI - PMC - PubMed
1. Eraslan G, Avsec Ž, Gagneur J. et al. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet 2019;20:389–403. 10.1038/s41576-019-0122-6. - DOI - PubMed
1. Hacking S. ChatGPT and medicine: together we embrace the AI renaissance. JMIR Bioinform Biotechnol 2024;5:e52700. 10.2196/52700. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics

Affiliations

Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics

Authors

Affiliations

Abstract

Figures

Similar articles

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources