Machine learning: its challenges and opportunities in plant system biology
- PMID: 35575915
- DOI: 10.1007/s00253-022-11963-6
Machine learning: its challenges and opportunities in plant system biology
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Keywords: Big data; Data integration; Epigenomics; Multi-omics; Plant molecular biology; Prediction; Protein function; Transcription factor.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
References
-
- Acharjee A, Kloosterman B, Visser RGF, Maliepaard C (2016) Integration of multi-omics data for prediction of phenotypic traits using random forest. BMC Bioinform 17(5):180. https://doi.org/10.1186/s12859-016-1043-4 - DOI
-
- Aghbashlo M, Peng W, Tabatabaei M, Kalogirou SA, Soltanian S, Hosseinzadeh-Bandbafha H, Mahian O, Lam SS (2021) Machine learning technology in biodiesel research: a review. Prog Energy Combust Sci 85:100904. https://doi.org/10.1016/j.pecs.2021.100904 - DOI
-
- Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. https://doi.org/10.1038/nbt.3300 - DOI - PubMed
-
- Alizadeh M, Hoy R, Lu B, Song L (2021) Team effort: Combinatorial control of seed maturation by transcription factors. Curr Opin Plant Biol 63:102091. https://doi.org/10.1016/j.pbi.2021.102091 - DOI - PubMed
-
- Amodio M, van Dijk D, Srinivasan K, Chen WS, Mohsen H, Moon KR, Campbell A, Zhao Y, Wang X, Venkataswamy M, Desai A, Ravi V, Kumar P, Montgomery R, Wolf G, Krishnaswamy S (2019) Exploring single-cell data with deep multitasking neural networks. Nat Methods 16(11):1139–1145. https://doi.org/10.1038/s41592-019-0576-7 - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
