This is a preprint.
Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations
- PMID: 40666979
- PMCID: PMC12262197
- DOI: 10.1101/2025.06.15.659780
Scaling k-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations
Abstract
We present improved k-means clustering initialization strategies for molecular dynamics (MD) simulations, implemented as part of the N-ary Natural Initiation (NANI) method. Two new deterministic seeding strategies-strat_all and strat_reduced-extend the original NANI approaches and dramatically reduce the clustering runtime while preserving the quality of clustering results. These methods also preserve NANI's reproducible partitioning of well-separated and compact clusters while avoiding the costly iterative seed selection procedures of previous implementations. Testing on the β-heptapeptide and the HP35 systems shows that these new flavors achieved Calinski-Harabasz (CH) and Davies-Bouldin (DB) scores comparable to the previous NANI variant, indicating that the efficiency gains come with no quality decrease. We also show how this new variant can be used to greatly speed up our previously proposed Hierarchical Extended Linkage Method (HELM). These enhancements extend the reach of NANI to accelerate large-scale MD analysis both in stand-alone k-means clustering and as a component of hybrid workflows, and remove a key barrier to routine, scalable, and reproducible exploration of complex conformational ensembles. The improved NANI implementation is accessible through our MDANCE package: https://github.com/mqcomplab/MDANCE.
Conflict of interest statement
Conflict of Interest: The authors declare no competing financial interests.
Figures






Similar articles
-
Divide and Cluster: The DIVINE Framework for Deterministic Top-Down Analysis of Molecular Dynamics Trajectories.bioRxiv [Preprint]. 2025 Jun 26:2025.06.20.660828. doi: 10.1101/2025.06.20.660828. bioRxiv. 2025. PMID: 40667114 Free PMC article. Preprint.
-
Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.bioRxiv [Preprint]. 2025 Mar 10:2025.03.05.641742. doi: 10.1101/2025.03.05.641742. bioRxiv. 2025. Update in: J Chem Inf Model. 2025 Jun 23;65(12):6209-6220. doi: 10.1021/acs.jcim.5c00539. PMID: 40161705 Free PMC article. Updated. Preprint.
-
Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.J Chem Inf Model. 2025 Jun 23;65(12):6209-6220. doi: 10.1021/acs.jcim.5c00539. Epub 2025 Jun 2. J Chem Inf Model. 2025. PMID: 40452401
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Fabricating mice and dementia: opening up relations in multi-species research.In: Jenkins N, Jack-Waugh A, Ritchie L, editors. Multi-Species Dementia Studies. Bristol (UK): Bristol University Press; 2025 Feb 25. Chapter 2. In: Jenkins N, Jack-Waugh A, Ritchie L, editors. Multi-Species Dementia Studies. Bristol (UK): Bristol University Press; 2025 Feb 25. Chapter 2. PMID: 40690569 Free Books & Documents. Review.
References
-
- Xu D.; Tian Y. A Comprehensive Survey of Clustering Algorithms. Ann. Data Sci. 2015, 2 (2), 165–193. 10.1007/s40745-015-0040-1. - DOI
-
- Wang S.; Chang T.-H.; Cui Y.; Pang J.-S. Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization. IEEE Trans. Signal Process. 2021, 69, 5273–5288. 10.1109/TSP.2021.3102106. - DOI
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous