Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 6:50:267-301.
doi: 10.1146/annurev-biophys-091720-102019. Epub 2021 Feb 19.

Biomolecular Modeling and Simulation: A Prospering Multidisciplinary Field

Affiliations

Biomolecular Modeling and Simulation: A Prospering Multidisciplinary Field

Tamar Schlick et al. Annu Rev Biophys. .

Abstract

We reassess progress in the field of biomolecular modeling and simulation, following up on our perspective published in 2011. By reviewing metrics for the field's productivity and providing examples of success, we underscore the productive phase of the field, whose short-term expectations were overestimated and long-term effects underestimated. Such successes include prediction of structures and mechanisms; generation of new insights into biomolecular activity; and thriving collaborations between modeling and experimentation, including experiments driven by modeling. We also discuss the impact of field exercises and web games on the field's progress. Overall, we note tremendous success by the biomolecular modeling community in utilization of computer power; improvement in force fields; and development and application of new algorithms, notably machine learning and artificial intelligence. The combined advances are enhancing the accuracy andscope of modeling and simulation, establishing an exemplary discipline where experiment and theory or simulations are full partners.

Keywords: DNA folding; RNA folding; artificial intelligence; biomolecular dynamics; biomolecular modeling; biomolecular simulation; citizen science projects; machine learning; multiscale modeling; protein folding; structure prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Expectation curve for the field of biomolecular modeling and simulation. The field started with comprehensive molecular mechanics efforts, and it took off with the increasing availability of fast workstations and, later, supercomputers. Following unrealistically high short-term expectations and disappointments concerning the limited medical impact of modeling and genomic research on human disease treatment, better collaborations between theory and experiment have ushered the field to its current productive stage. Problems realized in the decade between 2000 and 2010 and later addressed include force field imperfections, conformational sampling limitations, some pharmacogenomics hurdles, and limited medical impact of genomics-based therapeutics for human diseases. Technological innovations that have helped drive the field include distributed computations and the advent of the use of graphic processing units for biomolecular computations. The molecular-dynamics-specialized supercomputer Anton made it possible in 2009 to reach the millisecond timescale for explicit-solvent all-atom simulations. The 2013 Nobel Prize in Chemistry awarded to Levitt, Karplus, and Warshel helped validate a field that lagged behind experiment and propel its trajectory. Abbreviation: QM/MM, quantum mechanics/molecular mechanics.
Figure 2
Figure 2
Metrics of the field’s rise in popularity and the evolution of computational performance. (a) Biomolecular modeling papers per year in peer-reviewed journals, as found in Scopus using the query search: “molecular dynamics” OR “biomolecular simulation” OR “molecular modeling” OR “molecular simulation” OR “biomolecular modeling”. (b) Biomolecular modeling papers from panel a in the 20 journals with the most numbers of modeling papers, rank-ordered according to the average number of modeling papers across the years sampled. (c) Biomolecular modeling papers from panel a appearing in high-impact-factor journals, rank-ordered by the SCImago Journal Rank (SJR) h-index. (d) Biomolecular modeling papers from panel a decomposed by method using the query search: ((“molecular modeling” OR “molecular simulation” OR “biomolecular modeling” OR ‘biomolecular simulation”) AND (“method name”)), where method name is “molecular dynamics”, “Monte Carlo”, “ab initio,” “coarse graining”, or “quantum mechanics/molecular mechanics”. (e) Biomolecular modeling papers from panel a decomposed by use of seven popular molecular dynamics packages and force fields using the query search: ((“molecular modeling” OR “molecular simulation” OR “biomolecular modeling” OR ‘biomolecular simulation”) AND (“package/force field”)), where package/force field is Amber (140), CHARMM (24), GROMACS (16), NAMD (145), OPLS (81), GROMOS (57), UFF (157), COMPASS (191), MMFF (60), and Desmond (22). (f) Ranked overall and academic computational systems as reported according to the LINPACK benchmark, as assembled in the Top500 supercomputer lists (www.top500.org). The estimated total speed for the Folding@home distributed computing project is shown in x86 TFLOPS for direct comparison with LINPACK speeds. Biomolecular modeling milestones are dated assuming the computations were performed about a year prior to publication, except for the two 1998 publications, which we associate with computations started in 1996. These include the 25-base-pair DNA system using NCSA SGI machines (204); villin using the Cray T3E900 (43); the bc1 membrane complex using the Cray T3E900 (70); the B-DNA dodecamer using MareNostrum/Barcelona (141); the fip35 protein run on NCSA Abe clusters (49); influenza A H1N1 using the Jade supercomputer (158); the HIV capsid using the Titan Cray XK7 (143); the GATA4 gene using the Trinity Phase 2 (83); and three simulations on Blue Waters, the nuclear core complex (50), the CypA/CA complex (111), and influenza A H1N1 (44). For the simulations in Blue Waters, which has opted out of the Top500 benchmark since 2012, we use estimates of sustained system performance/sustained petascale performance from 2012 and 2020 (14).
Figure 2
Figure 2
Metrics of the field’s rise in popularity and the evolution of computational performance. (a) Biomolecular modeling papers per year in peer-reviewed journals, as found in Scopus using the query search: “molecular dynamics” OR “biomolecular simulation” OR “molecular modeling” OR “molecular simulation” OR “biomolecular modeling”. (b) Biomolecular modeling papers from panel a in the 20 journals with the most numbers of modeling papers, rank-ordered according to the average number of modeling papers across the years sampled. (c) Biomolecular modeling papers from panel a appearing in high-impact-factor journals, rank-ordered by the SCImago Journal Rank (SJR) h-index. (d) Biomolecular modeling papers from panel a decomposed by method using the query search: ((“molecular modeling” OR “molecular simulation” OR “biomolecular modeling” OR ‘biomolecular simulation”) AND (“method name”)), where method name is “molecular dynamics”, “Monte Carlo”, “ab initio,” “coarse graining”, or “quantum mechanics/molecular mechanics”. (e) Biomolecular modeling papers from panel a decomposed by use of seven popular molecular dynamics packages and force fields using the query search: ((“molecular modeling” OR “molecular simulation” OR “biomolecular modeling” OR ‘biomolecular simulation”) AND (“package/force field”)), where package/force field is Amber (140), CHARMM (24), GROMACS (16), NAMD (145), OPLS (81), GROMOS (57), UFF (157), COMPASS (191), MMFF (60), and Desmond (22). (f) Ranked overall and academic computational systems as reported according to the LINPACK benchmark, as assembled in the Top500 supercomputer lists (www.top500.org). The estimated total speed for the Folding@home distributed computing project is shown in x86 TFLOPS for direct comparison with LINPACK speeds. Biomolecular modeling milestones are dated assuming the computations were performed about a year prior to publication, except for the two 1998 publications, which we associate with computations started in 1996. These include the 25-base-pair DNA system using NCSA SGI machines (204); villin using the Cray T3E900 (43); the bc1 membrane complex using the Cray T3E900 (70); the B-DNA dodecamer using MareNostrum/Barcelona (141); the fip35 protein run on NCSA Abe clusters (49); influenza A H1N1 using the Jade supercomputer (158); the HIV capsid using the Titan Cray XK7 (143); the GATA4 gene using the Trinity Phase 2 (83); and three simulations on Blue Waters, the nuclear core complex (50), the CypA/CA complex (111), and influenza A H1N1 (44). For the simulations in Blue Waters, which has opted out of the Top500 benchmark since 2012, we use estimates of sustained system performance/sustained petascale performance from 2012 and 2020 (14).
Figure 3
Figure 3
Milestone simulations in biomolecular modeling showing evolution in molecular dynamics timescale and system size. Consistent with Figure 2f, we assume that computations were performed one year before publication, except for the publications in 1998, for which the calculations were performed in 1996. The simulated systems in temporal order are: 25-base-pair DNA (5 ns and ~21k atoms) (204), villin protein (1 μs and 12k atoms) (43), bc1 membrane complex (1 ns and ~91k atoms) (70), B-DNA dodecamer (1.2 μs and ~16k atoms) (141), Fip35 protein (10 μs and ~30k atoms) (49), Fip35 and BPTI proteins (100 μs for Flip35 and 1 ms for BPTI, and ~ 13k atoms) (182), nuclear pore complex (1 μs and 15.5M atoms) (50), influenza A virus (1 μs and >1M atoms) (158), NMDA receptor in membrane (60 μs and ~507k atoms) (187), tubular CypA/CA complexes (100 ns and 25.6M atoms) (111), HIV-1 fully solvated empty capsid (1 μs and 64M atoms) (143), GATA4 gene (1 ns and 1B atoms) (83), and influenza A virus H1N1 (121 ns and ~160M atoms) (44).
Figure 4
Figure 4
Examples of modeling successes. (a) Protein folding. The folding of a small protein (top right), villin, inside the GroEL protein cavity (top left) is compared to its folding in bulk. The corresponding energy profiles of folding show significant differences in the shape of the unfolded basin, indicating the role of the chaperonin protein in stabilizing unfolded states. Panel adapted with permission from Reference . (b) RNA novel motif design. Shown is a flowchart of the computational pipeline for design of RNA-like tree graph topologies (73, 75). In these tree graph representations of RNA secondary structures, edges denote stems, and vertices denote loops, bulges, and junctions (169). We use graph partitioning to segment the target RNA-like graph into subgraphs, extract the corresponding atomic fragments from our RAG-3D database, construct a new sequence or structure using fragment assembly, and screen the top-scoring sequences using RNA 2D structure prediction programs to produce successful sequences that will fold onto the target RNA-like topology (–75, 117). (c) AlphaFold performance on prediction of inter-residue distances. Inter-residue distance distributions are obtained from the experimental structure (top) and predicted structure (bottom) of the miniprotein gHEEE02 (right), showing good agreement. Distance maps were obtained from https://deepmind.com. (d) Cloud computing to accelerate molecular dynamics. Extensive nonequilibrium simulations of nicotine unbinding from the nicotinic acetylcholine receptor are shown. The Cα fluctuation levels (colored on a scale from blue to red as fluctuations increase) show the sequence of structural changes coupled to the unbinding of the nicotine from the receptor that leads to its activation through a conformational change. Panel adapted with permission from Reference .
Figure 5
Figure 5
Examples of modeling failures. (a) RNA force fields. Riboswitch aptamer simulations with different force fields indicate that structural details and stabilities are force field dependent. With the CHARMM and ff99 force fields, the aptamer pseudoknotted fold is distorted compared to the X-ray structure, whereas with the ff99bsc0χOL3 force field, the aptamer fold is maintained. Panel adapted with permission from Reference . (b) Configurational sampling. A large water box is required to stabilize the unliganded tetramer of the hemoglobin. The unliganded structure obtained with molecular dynamics simulations using a water box of 150Å aligns well with the corresponding experimental structure (right), whereas the unliganded structure obtained with a water box of 120Å is instead similar to the experimental structure of the liganded conformation (left). Panel adapted with permission from Reference . (c) Advanced sampling techniques. Different free energy profiles are obtained with umbrella sampling for the forward (Fwd) and backward (Bwd) processes of a conformational change for a riboswitch in the presence (Lig) and absence (Unlig) of a ligand, a consequence of poor convergence of the simulations. Panel adapted with permission from Reference . (d) QM/MM simulations. Simulations of solutes treated at the quantum-mechanical level embedded in a rigid water model treated by classical molecular mechanics can lead to poor sampling due to insufficient coupling between the two regions. Shown is the incorrect distribution of the methanol O-H bond length obtained from QM/MM simulations (violet curve) compared to the ideal distribution (green curve). Panel adapted with permission from Reference . Abbreviations: MD, molecular dynamics; QM/MM, quantum mechanics/molecular mechanics.
Figure 6
Figure 6
Interplay modeling experiment. (a) Molecular dynamics (MD) simulations of the bacterial protein BamA predicted a membrane insertion mechanism by lateral opening of the β barrel that involves the strands β1 and β16 (top). Crosslinking experiments created artificial disulfide bonds between loops L1 and L6 that connect both β strands and confirmed that BamA function was inhibited (bottom). Panel adapted with permission from Reference . (b) MD simulations of the AcrB multidrug transporter with the drug nitrocefin predicted a drug-binding pocket that includes residues such as I278 and F178 (top). Mutagenesis and biophysics experiments, which measured the efflux rate at different nitrocefin concentrations, confirmed the role of these residues in the binding of the drugs (bottom). Panel adapted with permission from References and . (c) MD simulations of DNA minicircles predicted the formation of bubbles and kinks under torsional stress (top). Electron cryo-tomography experiments confirmed the formation of such geometries in DNA minicircles (bottom). Panel adapted with permission from Reference . (d) Chromatin crosslinking experiments show increased long-range internucleosome contacts without loss of zigzag short-range contacts for metaphase chromatin, compared to interphase chromatin (top; interaction patterns in green). Mesoscale model of fibers typical of interphase [0.5 LH/nucleosome and nucleosome repeat length (NRL) = 191 bp] and metaphase (no LH and NRL = 209 bp) chromatin provide the folding mechanism of hierarchical looping (or stacked loops in 3D) to explain such increases in long-range contacts with maintenance of short-range contacts. This can be observed in the computed contact maps and fiber structures (bottom), and in the interaction patterns for fibers with NRL = 191 or 209 bp (top, black solid and dashed lines) without LH (left) and with 0.5 LH/nulceosome (right) (56).

Similar articles

Cited by

References

    1. Abriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. 2018. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods. Proteins 86(Suppl. 1):97–112 - PubMed
    1. Adamczak B, Kogut M, Czub J. 2018. Effect of osmolytes on the thermal stability of proteins: replica exchange simulations of Trp-cage in urea and betaine solutions. Phys. Chem. Chem. Phys 20(16):11174–82 - PubMed
    1. Alder BJ, Wainwright TE. 1959. Studies in molecular dynamics. I. General method. J. Chem. Phys 31(2):459–66
    1. Allinger NL. 1976. Calculation of molecular structure and energy by force-field methods. In Advances in Physical Organic Chemistry, Vol. 13, ed. Gold V, Bethell D, pp. 1–82. Cambridge, MA: Academic
    1. Allinger NL, Miller MA, Van Catledge FA, Hirsch JA. 1967. Conformational analysis. LVII. The calculation of the conformational structures of hydrocarbons by the Westheimer-Hendrickson-Wiberg method. J. Am. Chem. Soc 89(17):4345–57

Publication types