Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 25;128(29):7043-7067.
doi: 10.1021/acs.jpcb.4c01558. Epub 2024 Jul 11.

The Open Force Field Initiative: Open Software and Open Science for Molecular Modeling

Affiliations

The Open Force Field Initiative: Open Software and Open Science for Molecular Modeling

Lily Wang et al. J Phys Chem B. .

Abstract

Force fields are a key component of physics-based molecular modeling, describing the energies and forces in a molecular system as a function of the positions of the atoms and molecules involved. Here, we provide a review and scientific status report on the work of the Open Force Field (OpenFF) Initiative, which focuses on the science, infrastructure and data required to build the next generation of biomolecular force fields. We introduce the OpenFF Initiative and the related OpenFF Consortium, describe its approach to force field development and software, and discuss accomplishments to date as well as future plans. OpenFF releases both software and data under open and permissive licensing agreements to enable rapid application, validation, extension, and modification of its force fields and software tools. We discuss lessons learned to date in this new approach to force field development. We also highlight ways that other force field researchers can get involved, as well as some recent successes of outside researchers taking advantage of OpenFF tools and data.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): DLM serves on the scientific advisory boards of OpenEye Scientific Software and Anagenex, and is an Open Science Fellow with Psivant Therapeutics. MRS an Open Science Fellow with Psivant Therapeutics and consults for Relay Therapeutics. MKG has an equity interest in and is a cofounder and scientific advisor of VeraChem LLC; and is an advisor to Denovicon Therapeutics and InCerebro Inc. YW has limited financial interest in Flagship Pioneering, Inc., and its subsidiaries.

Figures

Figure 1
Figure 1
Indirect chemical perception requires that a library of atom types encodes all potentially relevant chemical environments. Force field assignment via indirect chemical perception requires several stages of processing. First, in the force field development process (left) a human expert (“wizard”) considers a set of molecules which the force field should cover and decides which chemical environments will be important to treat separately, choosing a set of atom types to bin this chemistry and tabulating or encoding these atom type definitions. The expert then encodes a typing engine which can assign these atom types to arbitrary molecules, writing out a chemical graph with atoms (nodes) labeled by atom types. Once this engine is in place, the expert separately encodes a parametrization machinery which will read in labeled chemical graphs and assign force field parameters based on atom types, often from a lookup table called a parameter file. This engine will write out the result to a file containing a parametrized system suitable for simulations. The expert also develops the parameter file which will be used by the parametrization engine. Second, in the parameter assignment process (right), a specific molecule or system is input into the typing engine previously developed, which applies the atom type definitions and writes out a labeled chemical graph. This labeled graph is then processed by the parametrization engine to produce a parametrized system suitable for simulation. This process is indirect—the parametrization engine considers a labeled graph, not the molecule itself. Thus, in this final step, all of the relevant information about distinct chemical environments must be encoded by the atom types and other information in the graph (in AMBER-family force fields, just the atom types and connectivity). Figure adapted from ref (67). Available under a CC-BY 4.0 license. Copyright 2018, Mobley et al.
Figure 2
Figure 2
Direct chemical perception eliminates the need to encode all relevant chemical environment information in arbitrary predefined atom types. Force field assignment via direct chemical perception works on the full chemical graph of the molecules involved (including elements, connectivity, bond order, etc.), rather than first encoding information about the chemical environment into a complex set of predetermined atom types. First, in the force field development process (left) a human expert (“wizard”) and/or an automated method (a force field engine, FF engine) considers a set of molecules which the force field should cover (as well as potentially input data) and develops a force field to cover this chemistry, producing a set of parameter definitions and a parametrization engine that can apply these to molecules. Second, in the parameter assignment process (right), a specific molecule or system is input into the parametrization engine previously developed, which processes the molecule and uses the parameter definitions to apply force field parameters, producing a parametrized system suitable for simulation. The parameter assignment process is direct; the parametrization engine acts directly on the chemical graph of the molecules comprising the system, so all chemical environment information provided (or computable) is available to the engine. Unlike indirect chemical perception, there is no intermediate step of assigning atom type labels to a molecular graph; parameters are assigned directly based on the chemistry. Figure adapted from ref (67). Available under a CC-BY 4.0 license. Copyright 2018, Mobley et al.
Figure 3
Figure 3
Selected categories of physical property training data, before and after LJ optimization. These plots show parity between experiment and simulation for physical properties in the training set, before (Parsley 1.3.0), and after LJ training. “MSE” in the panel legends refers to the mean signed error (bias) of the data set. Panel a shows correction of systematic error in bromide density prediction, particularly in data-based reduction in [#35:1] Rmin/2. Panel b shows correction in ΔHmix of alcohol/ester mixtures after training to mixture data. As the ester group is a hydrogen bond donor but not acceptor, optimization of energy and density of pure esters would not recognize the need to create favorable interactions with hydrogen bond donors; only by including thermodynamic properties of liquid mixtures in fitting can we properly treat complex mixtures of molecules. Figure adapted from ref (39). Copyright 2023, American Chemical Society.
Figure 4
Figure 4
Quality of optimized geometries relative to QM reference data on our benchmark data set. Shown is a cumulative distribution function (CDF) assessing what fraction of QM optimized geometries are predicted correctly (within a given RMSD cutoff) by MM optimizations for molecules in OpenFF’s public industry benchmarking set, consisting of 9847 molecules with a total of more than 70K conformers. A higher CDF is better. The QM reference approach is B3LYP-D3BJ/DZVP. Different colors/styles compare different OpenFF versions beginning with version 1.0, and for reference, GAFF 2.11 with AM1BCC charges is shown for comparison. The inset zooms in on the boxed portion of the CDF. Adapted from ref (112). Available under a CC-BY 4.0 license. Copyright Mobley, Wagner, Wang and the Open Force Field Initiative, 2023.
Figure 5
Figure 5
General approach of BESMARTS parameter search. For each parameter, the chemical environments that matched are combined into a single pattern. The combined pattern identifies SMARTS primitives that have multiple values that are then used to derive new patterns. Each new pattern is based on the original parameter ([#6X3:1] [#6X3:2]) with one or more primitives (represented as bits) added. In this example, the bonds that matched, when combined, show that bonds in 5-membered rings and 6-membered rings matched the original parameter. This offers the r5 and r6 primitives as a means to split, and the new candidate parameters [#6X3r5:1] [#6X3:2] and [#6X3r6:1] [#6X3:2] are generated and subsequently evaluated for performance. The splits can take multiple bits simultaneously, can additionally search the local environment for additional primitives to find more specialized splits. Image adapted from Gokey and Mobley. Available under a CC-BY 4.0 license. Copyright 2023, Gokey and Mobley.
Figure 6
Figure 6
Software workflow for iterative improvement of force fields. An initial force field is implemented by the openff-toolkit, and the molecular systems needed for fitting the targeted observables are built from this force field. The force field parameters are optimized using regularized least-squares with ForceBalance, with QM data coming from stored calculations in QC Archive, and experimental condensed phase data coming from several different data sets. Condensed phase simulations are carried out using OpenFF Evaluator, and included in the optimization, though usually we optimize terms on condense phase properties after valence parameters are optimized. This produces a force field that can than then be validated. Adapted from ref (137). Available under the CC-BY 4.0 license. Copyright 2023, Boothroyd, Mobley, Wagner and the Open Force Field Initiative.
Figure 7
Figure 7
How parameter differences affect binding free energy accuracy. (a) Shown are differences in accuracy (RMS error) between OpenFF 1.0 and OpenFF 2.0, for converged relative binding free energy calculations. Only statistically significant (95% CI) changes are shown, for parameters which are used in multiple ligands across multiple targets. Stars in front of parameter identifiers indicate significant parameter changes, with more stars indicating larger changes. Upward bars indicate accuracy (relative to experiment) was decreased by the force field change, and downward bars indicate accuracy was improved. (b) and (c) show specific example relative binding free energy calculations where results changed substantially across force fields. Figure adapted from ref (153), where it is described in more detail. Available under the CC-BY 4.0 license, Copyright 2023, Hahn et al.

Similar articles

Cited by

References

    1. Ponder J. W.; Case D. A.. Advances in Protein Chemistry; Protein Simulations; Academic Press, 2003; Vol. 66; pp 27–85. - PubMed
    1. Case D. A.; Cheatham T. E.; Darden T.; Gohlke H.; Luo R.; Merz K. M.; Onufriev A.; Simmerling C.; Wang B.; Woods R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668–1688. 10.1002/jcc.20290. - DOI - PMC - PubMed
    1. Phillips J. C.; Braun R.; Wang W.; Gumbart J.; Tajkhorshid E.; Villa E.; Chipot C.; Skeel R. D.; Kalé L.; Schulten K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781–1802. 10.1002/jcc.20289. - DOI - PMC - PubMed
    1. van der Spoel D.; Lindahl E.; Hess B.; Groenhof G.; Mark A. E.; Berendsen H. J. C. GROMACS: Fast, Flexible, and Free. J. Comput. Chem. 2005, 26, 1701–1718. 10.1002/jcc.20291. - DOI - PubMed
    1. Chipot C., Pohorille A., Eds. Free Energy Calculations: Theory and Applications in Chemistry and Biology; Springer Series in Chemical Physics; Springer-Verlag: Berlin Heidelberg, 2007.