Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Aug 14;63(15):4505-4532.
doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.

Open-Source Machine Learning in Computational Chemistry

Affiliations
Review

Open-Source Machine Learning in Computational Chemistry

Alexander Hagg et al. J Chem Inf Model. .

Abstract

The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
An illustration of how open data, open models, and open source codes are used in the different subdomains, as described herein, of computational chemistry.
Figure 2
Figure 2
Histogram of the number of forks for projects on Github, leaving out six projects that have ≥175 forks.
Figure 3
Figure 3
Histogram of the top libraries called (≥24 times) in 167 GitHub Python projects reported herein. Scientific libraries are shown in blue, Python3 standard libraries (https://docs.python.org/3/library) in red, and additional libraries in green.

Similar articles

Cited by

References

    1. Sonnenburg S.; Braun M. L.; Ong C. S.; Bengio S.; Bottou L.; Holmes G.; LeCun Y.; Müller K.-R.; Pereira F.; Rasmussen C. E.; Rätsch G.; Schölkopf B.; Smola A.; Vincent P.; Weston J.; Williamson R. The Need for Open Source Software in Machine Learning. J. Mach. Learn. Res. 2007, 8, 2443–2466.
    1. Langenkamp M.; Yue D. N.. How Open Source Machine Learning Software Shapes AI. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society; ACM: New York, 2022; pp 385–395
    1. Pirhadi S.; Sunseri J.; Koes D. R. Open Source Molecular Modeling. J. Mol. Graphics Modell. 2016, 69, 127–143. 10.1016/j.jmgm.2016.07.008. - DOI - PMC - PubMed
    1. Elton D. C.; Boukouvalas Z.; Fuge M. D.; Chung P. W. Deep Learning for Molecular Design – a Review of the State of the Art. Mol. Syst. Des. Eng. 2019, 4, 828–849. 10.1039/C9ME00039A. - DOI
    1. Bernetti M.; Bertazzo M.; Masetti M. Data-Driven Molecular Dynamics: A Multifaceted Challenge. Pharmaceuticals 2020, 13, 253.10.3390/ph13090253. - DOI - PMC - PubMed

LinkOut - more resources