. 2025 Apr 22:11:e2762.

doi: 10.7717/peerj-cs.2762. eCollection 2025.

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Felipe Ocampo Osorio^{1

2

3

4}, Santiago Pedroza Gomez^{1

2}, David Esteban Rebellón Sanchez^{1

2}, Richard Ramirez Fernandez^{1

2}, Reinel Tabares-Soto^{3

5}, Mario Alejandro Bravo-Ortíz^{3

5

6}, Gustavo Adolfo Cruz Suarez^{1

2

4}

Affiliations

¹ Unidad de Inteligencia Artificial, Fundación Valle del Lili, Cali, Valle del Cauca, Colombia.
² Centro de Investigaciones Clínicas, Fundación Valle del Lili, Cali, Valle del Cauca, Colombia.
³ Departamento de Electrónica y Automatización, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
⁴ Departamento de Salud Pública y Medicina Comunitaria, Universidad ICESI, Cali, Valle del Cauca, Colombia.
⁵ Departamento de Sistemas e Informática, Universidad de Caldas, Manizales, Caldas, Colombia.
⁶ Centro de Bioinformática y Biología Computacional (BIOS), Manizales, Caldas, Colombia.

PMID: 40567710
PMCID: PMC12190658
DOI: 10.7717/peerj-cs.2762

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Felipe Ocampo Osorio et al. PeerJ Comput Sci. 2025.

. 2025 Apr 22:11:e2762.

doi: 10.7717/peerj-cs.2762. eCollection 2025.

Authors

Affiliations

¹ Unidad de Inteligencia Artificial, Fundación Valle del Lili, Cali, Valle del Cauca, Colombia.
² Centro de Investigaciones Clínicas, Fundación Valle del Lili, Cali, Valle del Cauca, Colombia.
³ Departamento de Electrónica y Automatización, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
⁴ Departamento de Salud Pública y Medicina Comunitaria, Universidad ICESI, Cali, Valle del Cauca, Colombia.
⁵ Departamento de Sistemas e Informática, Universidad de Caldas, Manizales, Caldas, Colombia.
⁶ Centro de Bioinformática y Biología Computacional (BIOS), Manizales, Caldas, Colombia.

PMID: 40567710
PMCID: PMC12190658
DOI: 10.7717/peerj-cs.2762

Abstract

The high prevalence of patient absenteeism in medical appointments poses significant challenges for healthcare providers and patients, causing delays in service delivery and increasing operational inefficiencies. Addressing this issue is crucial in the internal medicine department, a fundamental pillar of comprehensive adult healthcare that manages various chronic and complex conditions. To mitigate absenteeism, we present an innovative application of machine learning models specifically designed to predict the risk of patient absenteeism in the internal medicine department of Fundación Valle del Lili, a high-complexity hospital in Colombia. Leveraging an institutional database, we conducted a statistical analysis to identify critical variables influencing absenteeism risk, including clinical and sociodemographic factors and characteristics of previously attended appointments. Our study evaluated seven distinct machine learning models, explored various data processing techniques, and addressed class imbalance through oversampling and undersampling strategies. Hyperparameter optimization was conducted for each model configuration, culminating in selecting the Bagging RandomForest model, which demonstrated outstanding performance when combined with standardized data and balanced using the Synthetic Minority Oversampling Technique (SMOTE). Additionally, Shapley values (SHAP) were applied to enhance the interpretability of the model, enabling the identification of the most influential variables in predicting medical absenteeism, such as the number of previous absences, the day and month of the appointment, and diagnosed diseases. The selected model achieved a predictive accuracy of 84.80 ± 0.81%, an AUC value of 0.89, an F1-score of 84.75%, and a recall of 83.02% in cross-validation experiments. These results highlight the potential of our experimental approach to identify the most suitable model for proactively predicting patients at high risk of absenteeism, optimizing resource allocation, and improving the quality of medical care in internal medicine in the future. Our methodology provides a foundation for reducing operational inefficiencies and strengthening intervention strategies. This benefits healthcare providers and patients through more timely and effective care. Ultimately, this approach contributes to improving patient outcomes and institutional efficiency.

Keywords: Internal medicine; Machine learning; Medical appointments; No-shows; Non-attendance.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Data distribution: (A) Distribution of the data referring to the database, and (B) age distribution dataset by class.**

**Figure 5. SHAP values for Bagging RandomForest with standarized data: (A) SMOTE balancing (B) ADASYN balancing, and (C) unbalanced data.**

**Figure 6. Bagging RandomForest confusion matrix, and ROC curve.**

**Figure 7. Feature importance plot obtained from the Bagging RandomForest model with SMOTE.**

See this image and copyright information in PMC

References

1. Aldi F, Hadi F, Rahmi NA, Defit S. Standardscaler’s potential in enhancing breast cancer accuracy using machine learning. Journal of Applied Engineering and Technological Science. 2023;5(1):401–413. doi: 10.37385/jaets.v5i1.3080. - DOI
1. Alhudhaif A. A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach. PeerJ Computer Science. 2021;7:e523. doi: 10.7717/peerj-cs.523. - DOI - PMC - PubMed
1. Ampomah EK, Qin Z, Nyame G, Botchey FE. Stock market decision support modeling with tree-based adaboost ensemble machine learning models. Informatica. 2021;44(4):477–489. doi: 10.31449/inf.v44i4.3159. - DOI
1. Ayyadevara VK. Gradient boosting machine. Berkeley, CA: Apress; 2018.
1. Battista P, Salvatore C, Castiglioni I. Optimizing neuropsychological assessments for cognitive, behavioral, and functional impairment classification: a machine learning study. Behavioural Neurology. 2017;2017:1–19. doi: 10.1155/2017/1850909. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Affiliations

Predicting no-shows at outpatient appointments in internal medicine using machine learning models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources