Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22:11:e2762.
doi: 10.7717/peerj-cs.2762. eCollection 2025.

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Affiliations

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Felipe Ocampo Osorio et al. PeerJ Comput Sci. .

Abstract

The high prevalence of patient absenteeism in medical appointments poses significant challenges for healthcare providers and patients, causing delays in service delivery and increasing operational inefficiencies. Addressing this issue is crucial in the internal medicine department, a fundamental pillar of comprehensive adult healthcare that manages various chronic and complex conditions. To mitigate absenteeism, we present an innovative application of machine learning models specifically designed to predict the risk of patient absenteeism in the internal medicine department of Fundación Valle del Lili, a high-complexity hospital in Colombia. Leveraging an institutional database, we conducted a statistical analysis to identify critical variables influencing absenteeism risk, including clinical and sociodemographic factors and characteristics of previously attended appointments. Our study evaluated seven distinct machine learning models, explored various data processing techniques, and addressed class imbalance through oversampling and undersampling strategies. Hyperparameter optimization was conducted for each model configuration, culminating in selecting the Bagging RandomForest model, which demonstrated outstanding performance when combined with standardized data and balanced using the Synthetic Minority Oversampling Technique (SMOTE). Additionally, Shapley values (SHAP) were applied to enhance the interpretability of the model, enabling the identification of the most influential variables in predicting medical absenteeism, such as the number of previous absences, the day and month of the appointment, and diagnosed diseases. The selected model achieved a predictive accuracy of 84.80 ± 0.81%, an AUC value of 0.89, an F1-score of 84.75%, and a recall of 83.02% in cross-validation experiments. These results highlight the potential of our experimental approach to identify the most suitable model for proactively predicting patients at high risk of absenteeism, optimizing resource allocation, and improving the quality of medical care in internal medicine in the future. Our methodology provides a foundation for reducing operational inefficiencies and strengthening intervention strategies. This benefits healthcare providers and patients through more timely and effective care. Ultimately, this approach contributes to improving patient outcomes and institutional efficiency.

Keywords: Internal medicine; Machine learning; Medical appointments; No-shows; Non-attendance.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Data distribution: (A) Distribution of the data referring to the database, and (B) age distribution dataset by class.
Figure 2
Figure 2. Data balanced by methods.
Figure 3
Figure 3. Correlation matrix.
Figure 4
Figure 4. F1-score comparison.
Figure 5
Figure 5. SHAP values for Bagging RandomForest with standarized data: (A) SMOTE balancing (B) ADASYN balancing, and (C) unbalanced data.
Figure 6
Figure 6. Bagging RandomForest confusion matrix, and ROC curve.
Figure 7
Figure 7. Feature importance plot obtained from the Bagging RandomForest model with SMOTE.

Similar articles

References

    1. Aldi F, Hadi F, Rahmi NA, Defit S. Standardscaler’s potential in enhancing breast cancer accuracy using machine learning. Journal of Applied Engineering and Technological Science. 2023;5(1):401–413. doi: 10.37385/jaets.v5i1.3080. - DOI
    1. Alhudhaif A. A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach. PeerJ Computer Science. 2021;7:e523. doi: 10.7717/peerj-cs.523. - DOI - PMC - PubMed
    1. Ampomah EK, Qin Z, Nyame G, Botchey FE. Stock market decision support modeling with tree-based adaboost ensemble machine learning models. Informatica. 2021;44(4):477–489. doi: 10.31449/inf.v44i4.3159. - DOI
    1. Ayyadevara VK. Gradient boosting machine. Berkeley, CA: Apress; 2018.
    1. Battista P, Salvatore C, Castiglioni I. Optimizing neuropsychological assessments for cognitive, behavioral, and functional impairment classification: a machine learning study. Behavioural Neurology. 2017;2017:1–19. doi: 10.1155/2017/1850909. - DOI - PMC - PubMed

LinkOut - more resources