Machine learning: its challenges and opportunities in plant system biology
- PMID: 35575915
- DOI: 10.1007/s00253-022-11963-6
Machine learning: its challenges and opportunities in plant system biology
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Keywords: Big data; Data integration; Epigenomics; Multi-omics; Plant molecular biology; Prediction; Protein function; Transcription factor.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Similar articles
-
A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology.Brief Funct Genomics. 2024 Sep 27;23(5):549-560. doi: 10.1093/bfgp/elae013. Brief Funct Genomics. 2024. PMID: 38600757 Review.
-
Machine learning meets omics: applications and perspectives.Brief Bioinform. 2022 Jan 17;23(1):bbab460. doi: 10.1093/bib/bbab460. Brief Bioinform. 2022. PMID: 34791021
-
Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x. J Transl Med. 2025. PMID: 40211300 Free PMC article. Review.
-
The application of multi-omics and systems biology to identify therapeutic targets in chronic kidney disease.Nephrol Dial Transplant. 2016 Dec;31(12):2003-2011. doi: 10.1093/ndt/gfv364. Epub 2015 Oct 20. Nephrol Dial Transplant. 2016. PMID: 26487673 Review.
-
Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology.Microbiol Res. 2015 Feb;171:52-64. doi: 10.1016/j.micres.2015.01.003. Epub 2015 Jan 7. Microbiol Res. 2015. PMID: 25644953 Review.
Cited by
-
A Reinforcement Learning approach to study climbing plant behaviour.Sci Rep. 2024 Aug 6;14(1):18222. doi: 10.1038/s41598-024-62147-3. Sci Rep. 2024. PMID: 39107370 Free PMC article.
-
Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system.Front Genet. 2023 Jan 9;13:1085332. doi: 10.3389/fgene.2022.1085332. eCollection 2022. Front Genet. 2023. PMID: 36699447 Free PMC article.
-
Plant and Disease Recognition Based on PMF Pipeline Domain Adaptation Method: Using Bark Images as Meta-Dataset.Plants (Basel). 2023 Sep 15;12(18):3280. doi: 10.3390/plants12183280. Plants (Basel). 2023. PMID: 37765444 Free PMC article.
-
The role of statistics in advancing nitric oxide research in plant biology: from data analysis to mechanistic insights.Front Plant Sci. 2025 Jul 1;16:1597030. doi: 10.3389/fpls.2025.1597030. eCollection 2025. Front Plant Sci. 2025. PMID: 40666296 Free PMC article. Review.
-
Antioxidant Defense Systems in Plants: Mechanisms, Regulation, and Biotechnological Strategies for Enhanced Oxidative Stress Tolerance.Life (Basel). 2025 Aug 14;15(8):1293. doi: 10.3390/life15081293. Life (Basel). 2025. PMID: 40868941 Free PMC article. Review.
References
-
- Acharjee A, Kloosterman B, Visser RGF, Maliepaard C (2016) Integration of multi-omics data for prediction of phenotypic traits using random forest. BMC Bioinform 17(5):180. https://doi.org/10.1186/s12859-016-1043-4 - DOI
-
- Aghbashlo M, Peng W, Tabatabaei M, Kalogirou SA, Soltanian S, Hosseinzadeh-Bandbafha H, Mahian O, Lam SS (2021) Machine learning technology in biodiesel research: a review. Prog Energy Combust Sci 85:100904. https://doi.org/10.1016/j.pecs.2021.100904 - DOI
-
- Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. https://doi.org/10.1038/nbt.3300 - DOI - PubMed
-
- Alizadeh M, Hoy R, Lu B, Song L (2021) Team effort: Combinatorial control of seed maturation by transcription factors. Curr Opin Plant Biol 63:102091. https://doi.org/10.1016/j.pbi.2021.102091 - DOI - PubMed
-
- Amodio M, van Dijk D, Srinivasan K, Chen WS, Mohsen H, Moon KR, Campbell A, Zhao Y, Wang X, Venkataswamy M, Desai A, Ravi V, Kumar P, Montgomery R, Wolf G, Krishnaswamy S (2019) Exploring single-cell data with deep multitasking neural networks. Nat Methods 16(11):1139–1145. https://doi.org/10.1038/s41592-019-0576-7 - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources