Data-driven learning of structure augments quantitative prediction of biological responses
- PMID: 38829926
- PMCID: PMC11233023
- DOI: 10.1371/journal.pcbi.1012185
Data-driven learning of structure augments quantitative prediction of biological responses
Abstract
Multi-factor screenings are commonly used in diverse applications in medicine and bioengineering, including optimizing combination drug treatments and microbiome engineering. Despite the advances in high-throughput technologies, large-scale experiments typically remain prohibitively expensive. Here we introduce a machine learning platform, structure-augmented regression (SAR), that exploits the intrinsic structure of each biological system to learn a high-accuracy model with minimal data requirement. Under different environmental perturbations, each biological system exhibits a unique, structured phenotypic response. This structure can be learned based on limited data and once learned, can constrain subsequent quantitative predictions. We demonstrate that SAR requires significantly fewer data comparing to other existing machine-learning methods to achieve a high prediction accuracy, first on simulated data, then on experimental data of various systems and input dimensions. We then show how a learned structure can guide effective design of new experiments. Our approach has implications for predictive control of biological systems and an integration of machine learning prediction and experimental design.
Copyright: © 2024 Ha et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures





Similar articles
-
Machine learning for the advancement of genome-scale metabolic modeling.Biotechnol Adv. 2024 Sep;74:108400. doi: 10.1016/j.biotechadv.2024.108400. Epub 2024 Jun 27. Biotechnol Adv. 2024. PMID: 38944218 Review.
-
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26. Artif Intell Med. 2019. PMID: 31383477 Review.
-
Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria.Elife. 2022 Dec 28;11:e82885. doi: 10.7554/eLife.82885. Elife. 2022. PMID: 36576775 Free PMC article.
-
A review on machine learning principles for multi-view biological data integration.Brief Bioinform. 2018 Mar 1;19(2):325-340. doi: 10.1093/bib/bbw113. Brief Bioinform. 2018. PMID: 28011753 Review.
-
Machine learning framework for assessment of microbial factory performance.PLoS One. 2019 Jan 15;14(1):e0210558. doi: 10.1371/journal.pone.0210558. eCollection 2019. PLoS One. 2019. PMID: 30645629 Free PMC article.
Cited by
-
Plant photosynthesis in basil (C3) and maize (C4) under different light conditions as basis of an AI-based model for PAM fluorescence/gas-exchange correlation.Front Plant Sci. 2025 May 19;16:1590884. doi: 10.3389/fpls.2025.1590884. eCollection 2025. Front Plant Sci. 2025. PMID: 40458213 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous