This is a preprint.
BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets
- PMID: 39253636
- PMCID: PMC11383443
BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets
Update in
-
BioBricks.ai: a versioned data registry for life sciences data assets.Front Artif Intell. 2025 Aug 13;8:1599412. doi: 10.3389/frai.2025.1599412. eCollection 2025. Front Artif Intell. 2025. PMID: 40880880 Free PMC article.
Abstract
Researchers in biomedical research, public health and the life sciences often spend weeks or months discovering, accessing, curating, and integrating data from disparate sources, significantly delaying the onset of actual analysis and innovation. Instead of countless developers creating redundant and inconsistent data pipelines, BioBricks.ai offers a centralized data repository and a suite of developer-friendly tools to simplify access to scientific data. Currently, BioBricks.ai delivers over ninety biological and chemical datasets. It provides a package manager-like system for installing and managing dependencies on data sources. Each 'brick' is a Data Version Control git repository that supports an updateable pipeline for extraction, transformation, and loading data into the BioBricks.ai backend at https://biobricks.ai. Use cases include accelerating data science workflows and facilitating the creation of novel data assets by integrating multiple datasets into unified, harmonized resources. In conclusion, BioBricks.ai offers an opportunity to accelerate access and use of public data through a single open platform.
Keywords: BioBricks.ai; Bioinformatics; Cheminformatics; Data Integration; Machine Learning; Public Health Data.
Conflict of interest statement
10.Conflict of Interest The authors declare the following potential conflicts of interest regarding the research and publication of this paper: BioBricks is a product developed by Insilica LLC, and many of the authors are employees of Insilica LLC. As such, there may be a perceived or real financial interest in the outcomes of the research and the development of BioBricks. The authors affirm that their contributions to the research and the manuscript were conducted with scientific integrity and without bias influenced by their association with Insilica LLC.
Figures
References
-
- Ramos MC, Collison CJ, White AD. A Review of Large Language Models and Autonomous Agents in Chemistry [Internet]. arXiv; 2024. [cited 2024 Jul 15]. Available from: http://arxiv.org/abs/2407.01603 - PMC - PubMed
-
- Fabian B, Edlich T, Gaspar H, Segler M, Meyers J, Fiscato M, et al. Molecular representation learning with language models and domain-relevant auxiliary tasks [Internet]. arXiv; 2020. [cited 2024 Jul 15]. Available from: http://arxiv.org/abs/2011.13230
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources