. 2018 Mar 27;19(1):109.

doi: 10.1186/s12859-018-2090-9.

Diabetes classification model based on boosting algorithms

Peihua Chen¹, Chuandi Pan²

Affiliations

¹ Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical University, Wenzhou, China. chenphwmu666@163.com.
² Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou City, China.

PMID: 29587624
PMCID: PMC5872396
DOI: 10.1186/s12859-018-2090-9

Diabetes classification model based on boosting algorithms

Peihua Chen et al. BMC Bioinformatics. 2018.

. 2018 Mar 27;19(1):109.

doi: 10.1186/s12859-018-2090-9.

Authors

Peihua Chen¹, Chuandi Pan²

Affiliations

¹ Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical University, Wenzhou, China. chenphwmu666@163.com.
² Department of Computer Technology and Information Management, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou City, China.

PMID: 29587624
PMCID: PMC5872396
DOI: 10.1186/s12859-018-2090-9

Abstract

Background: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses.

Results: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99.

Conclusions: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.

Keywords: Boosting algorithms; Computer-aided diagnoses; Diabetes.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

All of the procedures performed in this study were approved by the ethics committee of the First Affiliated Hospital of WenZhou Medical University (Institutional review board approval No. 2017–126). Ethical approval for all of the procedures followed were in accordance with the Declaration of Helsinki of 1964 and its later versions. Due to the retrospective nature of the study, informed consent was waived by the IRB of the First Affiliated Hospital of WenZhou Medical University. Rong Jin granted permission in the Ethics approval and consent to participate section.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Data processing method and overall process

**Fig. 2**
Examination results of important detection indexes of non-diabetic patients (a) and diabetic patients (b)

**Fig. 3**
a Prediction accuracy and receiver operating characteristic (ROC) area of the models built by each boosting algorithm. b TP, FP, TN, and FN coefficients of the model built by each boosting algorithm

See this image and copyright information in PMC

References

1. Mann DM, Bertoni AG, Shimbo D, Carnethon MR, Chen H, Jenny NS, Muntner P. Comparative validity of 3 diabetes mellitus risk prediction scoring models in a multiethnic US cohort: the multi-ethnic study of atherosclerosis. Am J Epidemiol. 2010;171(9):980–988. doi: 10.1093/aje/kwq030. - DOI - PMC - PubMed
1. Bener A, Kim EJ, Mutlu F, Eliyan A, Delghan H, Nofal E, Shalabi L, Wadi N. Burden of diabetes mellitus attributable to demographic levels in Qatar: an emerging public health problem. Diabetes Metab Syndr. 2014;8(4):216–220. doi: 10.1016/j.dsx.2014.09.005. - DOI - PubMed
1. Peter P, Lipska K. The rising cost of diabetes care in the USA. Lancet Diabetes Endocrinol. 2016;4:479–480. doi: 10.1016/S2213-8587(15)00519-7. - DOI - PMC - PubMed
1. You S. Embracing medical innovation in the era of big data. Zhonghua Wei Chang Wai Ke Za Zhi. 2015;18(1):1–5. - PubMed
1. Reznick JS. Media reviews. Digitisation, big data, and the future of the medical humanities. Introduction. Med Hist. 2016;60(1):126. doi: 10.1017/mdh.2015.82. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Diabetes classification model based on boosting algorithms

Affiliations

Diabetes classification model based on boosting algorithms

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical