Biological databases in the age of generative artificial intelligence
- PMID: 40177265
- PMCID: PMC11964588
- DOI: 10.1093/bioadv/vbaf044
Biological databases in the age of generative artificial intelligence
Abstract
Summary: Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.
Availability and implementation: Not applicable.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
None declared.
Similar articles
-
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x. Respir Res. 2024. PMID: 39709425 Free PMC article. Review.
-
The ultimate power play in research - partnering with patients, partnering with power.Res Involv Engagem. 2025 Jun 17;11(1):65. doi: 10.1186/s40900-025-00745-9. Res Involv Engagem. 2025. PMID: 40528262 Free PMC article.
-
Trust, Trustworthiness, and the Future of Medical AI: Outcomes of an Interdisciplinary Expert Workshop.J Med Internet Res. 2025 Jun 2;27:e71236. doi: 10.2196/71236. J Med Internet Res. 2025. PMID: 40455564 Free PMC article.
-
Introducing the dataset for measuring centrality for sustainability-A case study of Pecinci municipality, Serbia.Data Brief. 2025 May 27;61:111714. doi: 10.1016/j.dib.2025.111714. eCollection 2025 Aug. Data Brief. 2025. PMID: 40534919 Free PMC article.
-
Pharmacological and electronic cigarette interventions for smoking cessation in adults: component network meta-analyses.Cochrane Database Syst Rev. 2023 Sep 12;9(9):CD015226. doi: 10.1002/14651858.CD015226.pub2. Cochrane Database Syst Rev. 2023. PMID: 37696529 Free PMC article.
Cited by
-
Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470. Curr Issues Mol Biol. 2025. PMID: 40699869 Free PMC article. Review.
References
-
- Alemohammad S, Casco-Rodriguez J, Luzi L et al. Self-consuming generative models go MAD. In: Proceedings of The Twelfth International Conference on Learning Representation, ICLR 2024, Vienna, Austria, 2024.
Publication types
LinkOut - more resources
Full Text Sources
Research Materials