Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jun 18:2025.06.15.659780.
doi: 10.1101/2025.06.15.659780.

Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations

Affiliations

Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations

Jherome Brylle Woody Santos et al. bioRxiv. .

Abstract

We present improved k-means clustering initialization strategies for molecular dynamics (MD) simulations, implemented as part of the N-ary Natural Initiation (NANI) method. Two new deterministic seeding strategies-strat_all and strat_reduced-extend the original NANI approaches and dramatically reduce the clustering runtime while preserving the quality of clustering results. These methods also preserve NANI's reproducible partitioning of well-separated and compact clusters while avoiding the costly iterative seed selection procedures of previous implementations. Testing on the β-heptapeptide and the HP35 systems shows that these new flavors achieved Calinski-Harabasz (CH) and Davies-Bouldin (DB) scores comparable to the previous NANI variant, indicating that the efficiency gains come with no quality decrease. We also show how this new variant can be used to greatly speed up our previously proposed Hierarchical Extended Linkage Method (HELM). These enhancements extend the reach of NANI to accelerate large-scale MD analysis both in stand-alone k-means clustering and as a component of hybrid workflows, and remove a key barrier to routine, scalable, and reproducible exploration of complex conformational ensembles. The improved NANI implementation is accessible through our MDANCE package: https://github.com/mqcomplab/MDANCE.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: The authors declare no competing financial interests.

Figures

Figure 1:
Figure 1:
Change in Calinski-Harabasz (A, B) and Davies-Bouldin (C, D) indices for A, C: β-heptapeptide and B, D: HP35 after k-means NANI screening from k = 5 to k = 30.
Figure 1:
Figure 1:
Change in Calinski-Harabasz (A, B) and Davies-Bouldin (C, D) indices for A, C: β-heptapeptide and B, D: HP35 after k-means NANI screening from k = 5 to k = 30.
Figure 2:
Figure 2:
Overlaps of best representative structures from the six clusters of β-heptapeptide after performing NANI with strat_reduced.
Figure 3:
Figure 3:
Overlaps of best representative structures from the seven clusters of HP35 after performing NANI with strat_reduced. Helices 1, 2, and 3 are colored green, cyan, and yellow, respectively.
Figure 4:
Figure 4:
Change in Calinski-Harabasz (A, B) and Davies-Bouldin (C, D) indices for the HP35 simulation after trimming (retaining clusters with MSD < 10) the initial NANI clusters using the inter merge (A, C) and the intra merge (B, D).
Figure 5:
Figure 5:
Overlaps of best representative structures from the six clusters of HP35 after performing HELM from 60 NANI strat_reduced clusters using the intra merge with trimming.

Similar articles

References

    1. De Paris R.; Quevedo C. V.; Ruiz D. D.; Norberto De Souza O.; Barros R. C. Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments. Comput. Intell. Neurosci. 2015, 2015, 1–9. 10.1155/2015/916240. - DOI - PMC - PubMed
    1. Shao J.; Tanner S. W.; Thompson N.; Cheatham T. E. Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms. J. Chem. Theory Comput. 2007, 3 (6), 2312–2334. 10.1021/ct700119m. - DOI - PubMed
    1. Keller B.; Daura X.; Van Gunsteren W. F. Comparing Geometric and Kinetic Cluster Algorithms for Molecular Simulation Data. J. Chem. Phys. 2010, 132 (7), 074110. 10.1063/1.3301140. - DOI - PubMed
    1. Xu D.; Tian Y. A Comprehensive Survey of Clustering Algorithms. Ann. Data Sci. 2015, 2 (2), 165–193. 10.1007/s40745-015-0040-1. - DOI
    1. Wang S.; Chang T.-H.; Cui Y.; Pang J.-S. Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization. IEEE Trans. Signal Process. 2021, 69, 5273–5288. 10.1109/TSP.2021.3102106. - DOI

Publication types

LinkOut - more resources