Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb 17:14:1682879.
doi: 10.3389/fpubh.2026.1682879. eCollection 2026.

Value of an automated machine learning model with post-hoc explanation for predicting healthcare-seeking delays among residents in Tibetan regions

Affiliations

Value of an automated machine learning model with post-hoc explanation for predicting healthcare-seeking delays among residents in Tibetan regions

Zhenzhong Xi et al. Front Public Health. .

Abstract

Objective: This study aimed to investigate key determinants of healthcare-seeking delays among Tibetan residents and develop predictive models using automated machine learning (AutoML) with post-hoc SHAP interpretation alongside a clinical decision support system.

Methods: Face-to-face surveys using structured questionnaires were administered to 1,879 Tibetan residents. Data processing employed an AutoML framework: datasets were partitioned into training (n = 1,503) and testing (n = 376) subsets at an 8:2 ratio. Standardized preprocessing-including outlier rectification, one-hot encoding (OHE), and random forest-based multiple imputation (MI)-was applied. Model validation integrated 5-fold cross-validation and SHapley Additive exPlanations (SHAP) analysis.

Results: Among 1,879 participants, the healthcare-seeking delay incidence was 41.99%. The LightGBM model significantly outperformed conventional approaches (AUC > 0.86). SHAP feature importance analysis revealed the predictor hierarchy: Age > County hospital quality score > Distance to county hospital > Township health center quality score > Able to communicate in Chinese.

Conclusion: A high-performance model with post-hoc SHAP interpretation accurately identifies geographical, cultural, and healthcare resource variables to accurately identify high-risk populations. The developed clinical decision support system enables risk computation through modular interfaces, providing an evidence-based tool for optimizing hierarchical diagnosis and resource allocation in Tibetan healthcare.

Keywords: Tibetan healthcare; automated machine learning; clinical decision support system; healthcare-seeking delay; interpretability analysis.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Full workflow.
Figure 2
Figure 2
Comparative evaluation of optimization efficacy in swarm intelligence algorithms. The horizontal axis denotes different CEC2022 benchmark functions (or function indices), while the vertical axis represents the best fitness values (objective function values) obtained over 30 independent algorithm executions. Lower values indicate superior optimization efficacy. Each box plot illustrates the statistical distribution of 30 runs for a given algorithm on a specific test function. Narrower interquartile ranges (IQR) and whisker spans (extending to ±1.5 × IQR) reflect enhanced algorithmic stability. As visually evident, IETO exhibits markedly more compact distributions (lower median positions and reduced IQRs) than ETO, WOA, and PSO across most functions, validating its superior solution quality and robust convergence behavior.
Figure 3
Figure 3
Comparative analysis of convergence dynamics in swarm intelligence algorithms. The horizontal axis indicates iteration counts (1–500), and the vertical axis quantifies either the contemporary population mean fitness or best fitness values. Lower values denote higher solution quality. Each trajectory depicts evolutionary fitness progression for one algorithm. Accelerated initial descent rates characterize rapid convergence, while sustained declines toward lower asymptotic plateaus signify enhanced capability for escaping local optima. The IETO trajectory (distinguished by [specified line style/color]) demonstrates the steepest initial convergence gradient and achieves the lowest terminal values across functions, confirming its dual proficiency in swift convergence and global exploration efficacy.
Figure 4
Figure 4
Cross-validation performance of the training set. (A) ROC curve of training set; (B) PR curve of training set.
Figure 5
Figure 5
Cross-validation performance of the testing set. (A) ROC curve of testing set; (B) PR curve of testing set.
Figure 6
Figure 6
Decision curve analysis of the prediction model (A) training set and (B) testing set. Note: The Y-axis shows the net benefit, the realization represents the prediction model, the red dashed line represents the assumption that all patients develop delays, and the black dashed line represents the assumption that no patients develop delays.
Figure 7
Figure 7
Machine learning interpretability analysis. (A) The Shapley summary plot comprehensively presents the overall impact patterns of various features on model predictions across all samples. Each point in the plot represents a feature and its SHAP value (i.e., the feature’s contribution to prediction) for a specific sample. The color of the points indicates the actual value magnitude of the feature (yellow for high values, blue for low values), while the distribution along the horizontal axis (SHAP values) reflects how feature values influence predictions (positive values increase predictions, negative values decrease them). This visualization allows intuitive identification of which features generally correlate with increases or decreases in predicted values, as well as trend relationships between feature influence and feature magnitudes; (B) The Shapley feature importance plot displays the overall importance ranking of each feature’s impact on model predictions in bar chart form. Feature importance is determined by calculating the mean absolute SHAP value for each feature across all samples, thereby measuring its average contribution to model output variations. Longer bars indicate greater influence of the feature in the model’s overall decision-making process, providing researchers with clear insight into the most critical factors driving predictions; (C–E) Waterfall plots illustrate the cumulative contribution process of each feature to individual patient predictions. The baseline value represents the model’s average prediction for all patients, while feature contributions show how each feature affects the final prediction (red indicating increased risk, blue indicating decreased risk). The sum of all feature contributions yields the final predicted value; (F) The decision path plot compares decision pathways across multiple patients, demonstrating how different feature combinations lead to varying prediction outcomes. The horizontal axis shows predicted probabilities, the vertical axis lists features, and the curved pathways trace decision routes from baseline values to final predictions; (G–I) Force plots visually demonstrate how each feature “pushes” predictions toward higher or lower risk directions. Red arrows indicate features pushing predictions toward higher risk, blue arrows indicate features pushing toward lower risk, with arrow length representing the magnitude of influence; HosQuality: County hospital quality score; Distance: Distance to county hospital; TownQuality: Township health center quality score; Chinese: Able to communicate in Chinese.
Figure 8
Figure 8
SHAP interaction analysis between key indicators.
Figure 9
Figure 9
Demonstration of clinical decision support system.

References

    1. Guo W, Chen QW, Yan JX. Clinical application of RigiScan monitoring in the diagnosis and treatment of erectile dysfunction in the plateau area. Zhonghua Nan Ke Xue. (2020) 26:522–7. - PubMed
    1. Yang Y, Cheng J, Peng Y, Luo Y, Zou D, Yang Y, et al. Clinical features of patients with cerebral venous sinus thrombosis at plateau areas. Brain Behav. (2023) 13:e2998. doi: 10.1002/brb3.2998, - DOI - PMC - PubMed
    1. Ehsanul Huq KATM, Moriyama M, Zaman K, Chisti MJ, Long J, Islam A, et al. Health seeking behaviour and delayed management of tuberculosis patients in rural Bangladesh. BMC Infect Dis. (2018) 18:515. doi: 10.1186/s12879-018-3430-0, - DOI - PMC - PubMed
    1. Dehdar S, Salimifard K, Mohammadi R, Marzban M, Saadatmand S, Fararouei M, et al. Applications of different machine learning approaches in prediction of breast cancer diagnosis delay. Front Oncol. (2023) 13:1103369. doi: 10.3389/fonc.2023.1103369, - DOI - PMC - PubMed
    1. Zhang B, Sun Q, Lv Y, Sun T, Zhao W, Yan R, et al. Influencing factors for decision-making delay in seeking medical care among acute ischemic stroke patients in rural areas. Patient Educ Couns. (2023) 108:107614. doi: 10.1016/j.pec.2022.107614, - DOI - PubMed

LinkOut - more resources