Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Editorial
. 2025 Mar 20;5(1):vbaf044.
doi: 10.1093/bioadv/vbaf044. eCollection 2025.

Biological databases in the age of generative artificial intelligence

Affiliations
Editorial

Biological databases in the age of generative artificial intelligence

Mihai Pop et al. Bioinform Adv. .

Abstract

Summary: Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance, error propagation, and on understanding the impact of errors on analytic pipelines. Furthermore, we recommend enhanced funding for the stewardship and maintenance of public biological databases.

Availability and implementation: Not applicable.

PubMed Disclaimer

Conflict of interest statement

None declared.

Similar articles

Cited by

References

    1. Alemohammad S, Casco-Rodriguez J, Luzi L et al. Self-consuming generative models go MAD. In: Proceedings of The Twelfth International Conference on Learning Representation, ICLR 2024, Vienna, Austria, 2024.
    1. Arkin AP, Cottingham RW, Henry CS et al. KBase: the United States department of energy systems biology knowledgebase. Nat Biotechnol 2018;36:566–9. - PMC - PubMed
    1. Attwood TK, Blackford S, Brazas MD et al. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform 2019;20:398–404. - PMC - PubMed
    1. Attwood TK, Bongcam-Rudloff E, Brazas ME et al. ; GOBLET Consortium. GOBLET: the global organisation for bioinformatics learning, education and training. PLoS Comput Biol 2015;11:e1004143. - PMC - PubMed
    1. Attwood TK, Kell DB, McDermott P et al. Calling international rescue: knowledge lost in literature and data landslide! Biochem J 2009;424:317–33. - PMC - PubMed

Publication types