Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 27;19(1):109.
doi: 10.1186/s12859-018-2090-9.

Diabetes classification model based on boosting algorithms

Affiliations

Diabetes classification model based on boosting algorithms

Peihua Chen et al. BMC Bioinformatics. .

Abstract

Background: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses.

Results: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99.

Conclusions: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.

Keywords: Boosting algorithms; Computer-aided diagnoses; Diabetes.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

All of the procedures performed in this study were approved by the ethics committee of the First Affiliated Hospital of WenZhou Medical University (Institutional review board approval No. 2017–126). Ethical approval for all of the procedures followed were in accordance with the Declaration of Helsinki of 1964 and its later versions. Due to the retrospective nature of the study, informed consent was waived by the IRB of the First Affiliated Hospital of WenZhou Medical University. Rong Jin granted permission in the Ethics approval and consent to participate section.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Data processing method and overall process
Fig. 2
Fig. 2
Examination results of important detection indexes of non-diabetic patients (a) and diabetic patients (b)
Fig. 3
Fig. 3
a Prediction accuracy and receiver operating characteristic (ROC) area of the models built by each boosting algorithm. b TP, FP, TN, and FN coefficients of the model built by each boosting algorithm

References

    1. Mann DM, Bertoni AG, Shimbo D, Carnethon MR, Chen H, Jenny NS, Muntner P. Comparative validity of 3 diabetes mellitus risk prediction scoring models in a multiethnic US cohort: the multi-ethnic study of atherosclerosis. Am J Epidemiol. 2010;171(9):980–988. doi: 10.1093/aje/kwq030. - DOI - PMC - PubMed
    1. Bener A, Kim EJ, Mutlu F, Eliyan A, Delghan H, Nofal E, Shalabi L, Wadi N. Burden of diabetes mellitus attributable to demographic levels in Qatar: an emerging public health problem. Diabetes Metab Syndr. 2014;8(4):216–220. doi: 10.1016/j.dsx.2014.09.005. - DOI - PubMed
    1. Peter P, Lipska K. The rising cost of diabetes care in the USA. Lancet Diabetes Endocrinol. 2016;4:479–480. doi: 10.1016/S2213-8587(15)00519-7. - DOI - PMC - PubMed
    1. You S. Embracing medical innovation in the era of big data. Zhonghua Wei Chang Wai Ke Za Zhi. 2015;18(1):1–5. - PubMed
    1. Reznick JS. Media reviews. Digitisation, big data, and the future of the medical humanities. Introduction. Med Hist. 2016;60(1):126. doi: 10.1017/mdh.2015.82. - DOI - PMC - PubMed