This is a preprint.
Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations
- PMID: 40666979
- PMCID: PMC12262197
- DOI: 10.1101/2025.06.15.659780
Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations
Abstract
We present improved k-means clustering initialization strategies for molecular dynamics (MD) simulations, implemented as part of the N-ary Natural Initiation (NANI) method. Two new deterministic seeding strategies-strat_all and strat_reduced-extend the original NANI approaches and dramatically reduce the clustering runtime while preserving the quality of clustering results. These methods also preserve NANI's reproducible partitioning of well-separated and compact clusters while avoiding the costly iterative seed selection procedures of previous implementations. Testing on the β-heptapeptide and the HP35 systems shows that these new flavors achieved Calinski-Harabasz (CH) and Davies-Bouldin (DB) scores comparable to the previous NANI variant, indicating that the efficiency gains come with no quality decrease. We also show how this new variant can be used to greatly speed up our previously proposed Hierarchical Extended Linkage Method (HELM). These enhancements extend the reach of NANI to accelerate large-scale MD analysis both in stand-alone k-means clustering and as a component of hybrid workflows, and remove a key barrier to routine, scalable, and reproducible exploration of complex conformational ensembles. The improved NANI implementation is accessible through our MDANCE package: https://github.com/mqcomplab/MDANCE.
Conflict of interest statement
Conflict of Interest: The authors declare no competing financial interests.
Figures
References
-
- Xu D.; Tian Y. A Comprehensive Survey of Clustering Algorithms. Ann. Data Sci. 2015, 2 (2), 165–193. 10.1007/s40745-015-0040-1. - DOI
-
- Wang S.; Chang T.-H.; Cui Y.; Pang J.-S. Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization. IEEE Trans. Signal Process. 2021, 69, 5273–5288. 10.1109/TSP.2021.3102106. - DOI
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous