Machine learning and statistical inference in microbial population genomics
- PMID: 41015769
- PMCID: PMC12476627
- DOI: 10.1186/s13059-025-03775-4
Machine learning and statistical inference in microbial population genomics
Abstract
The availability of large genome datasets has changed the microbiology research landscape. Analyzing such data requires computationally demanding analyses, and new approaches have come from different data analysis philosophies. Machine learning and statistical inference have overlapping knowledge discovery aims and approaches. However, machine learning focuses on optimizing prediction, whereas statistical inference focuses on understanding the processes relating variables. In this review, we outline the different aspirations, precepts, and resulting methodologies, with examples from microbial genomics. Emphasizing complementarity, we argue that the combination and synthesis of machine learning and statistics has potential for pathogen research in the big data era.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.
Figures


References
-
- Wong ZSY, Zhou J, Zhang Q. Artificial intelligence for infectious disease big data analytics. Infect Dis Health. 2019;24:44–8. - PubMed
-
- Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models. arXiv; 2021 Available from: https://arxiv.org/abs/2108.07258. [cited 2025 Sept 2].
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources