Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 8;22(22):8615.
doi: 10.3390/s22228615.

A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions

Affiliations

A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions

Argyro Mavrogiorgou et al. Sensors (Basel). .

Abstract

Extracting useful knowledge from proper data analysis is a very challenging task for efficient and timely decision-making. To achieve this, there exist a plethora of machine learning (ML) algorithms, while, especially in healthcare, this complexity increases due to the domain's requirements for analytics-based risk predictions. This manuscript proposes a data analysis mechanism experimented in diverse healthcare scenarios, towards constructing a catalogue of the most efficient ML algorithms to be used depending on the healthcare scenario's requirements and datasets, for efficiently predicting the onset of a disease. To this context, seven (7) different ML algorithms (Naïve Bayes, K-Nearest Neighbors, Decision Tree, Logistic Regression, Random Forest, Neural Networks, Stochastic Gradient Descent) have been executed on top of diverse healthcare scenarios (stroke, COVID-19, diabetes, breast cancer, kidney disease, heart failure). Based on a variety of performance metrics (accuracy, recall, precision, F1-score, specificity, confusion matrix), it has been identified that a sub-set of ML algorithms are more efficient for timely predictions under specific healthcare scenarios, and that is why the envisioned ML catalogue prioritizes the ML algorithms to be used, depending on the scenarios' nature and needed metrics. Further evaluation must be performed considering additional scenarios, involving state-of-the-art techniques (e.g., cloud deployment, federated ML) for improving the mechanism's efficiency.

Keywords: catalogue; data analysis; healthcare; machine learning; prediction; supervised learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Indicative example of BNB steps.
Figure 2
Figure 2
Indicative example of KNN steps.
Figure 3
Figure 3
Indicative example of DT steps.
Figure 4
Figure 4
Indicative example of RF steps.
Figure 5
Figure 5
Indicative example of LR steps.
Figure 6
Figure 6
Indicative example of MLP steps.
Figure 7
Figure 7
Indicative example of SGD steps.
Figure 8
Figure 8
Overall mechanism architecture.
Figure 9
Figure 9
Confusion matrix.
Figure 10
Figure 10
Example of stroke probability form.
Figure 11
Figure 11
Precision results of ML models for each use case.
Figure 12
Figure 12
Recall results of ML models for each use case.
Figure 13
Figure 13
F1-score results of ML models for each use case.
Figure 14
Figure 14
Specificity results of ML models for each use case.
Figure 15
Figure 15
Train–validation–test score for diabetes use case.
Figure 16
Figure 16
Confusion matrix of prediction results for diabetes use case.
Figure 17
Figure 17
Performance comparison in the diabetes use case.
Figure 18
Figure 18
Train–validation–test score for stroke use case.
Figure 19
Figure 19
Confusion matrix of prediction results for stroke use case.
Figure 20
Figure 20
Performance comparison in the stroke use case.
Figure 21
Figure 21
Train–validation–test score for heart failure use case.
Figure 22
Figure 22
Confusion matrix of prediction results for heart failure use case.
Figure 23
Figure 23
Performance comparison in the heart failure use case.
Figure 24
Figure 24
Train–validation–test score for COVID-19 use case.
Figure 25
Figure 25
Confusion matrix of prediction results for COVID-19 use case.
Figure 26
Figure 26
Performance comparison in the COVID-19 use case.
Figure 27
Figure 27
Train–validation–test score for breast cancer use case.
Figure 28
Figure 28
Confusion matrix of prediction results for breast cancer use case.
Figure 29
Figure 29
Performance comparison in the breast cancer use case.
Figure 30
Figure 30
Train–validation–test score for kidney disease use case.
Figure 31
Figure 31
Confusion matrix of prediction results for kidney disease use case.
Figure 32
Figure 32
Performance comparison in the kidney disease use case.
Figure 33
Figure 33
Training performance comparison for each algorithm per dataset.

Similar articles

Cited by

References

    1. Power D.J., Sharda R., Burstein F. Decision Support Systems. John Wiley & Sons, Ltd.; Hoboken, NJ, USA: 2015.
    1. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V., Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015;13:8–17. doi: 10.1016/j.csbj.2014.11.005. - DOI - PMC - PubMed
    1. Pan L., Liu G., Lin F., Zhong S., Xia H., Sun X., Liang H. Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia. Sci. Rep. 2017;7:7402. doi: 10.1038/s41598-017-07408-0. - DOI - PMC - PubMed
    1. Zantalis F., Koulouras G., Karabetsos S., Kandris D. A review of machine learning and IoT in smart transportation. Future Internet. 2019;11:94. doi: 10.3390/fi11040094. - DOI
    1. Dixon M.F., Halperin I., Bilokon P. Machine Learning in Finance. Volume 1406 Springer; New York, NY, USA: 2020.