Diabetes classification model based on boosting algorithms
- PMID: 29587624
- PMCID: PMC5872396
- DOI: 10.1186/s12859-018-2090-9
Diabetes classification model based on boosting algorithms
Abstract
Background: Diabetes mellitus is a common and complicated chronic lifelong disease. Hence, it is of high clinical significance to find the most relevant clinical indexes and to perform efficient computer-aided pre-diagnoses and diagnoses.
Results: Non-parametric statistical testing is performed on hundreds of medical measurement index results between diabetic and non-diabetic populations. Two common boosting algorithms, Adaboost.M1 and LogitBoost, are selected to establish a machine model for diabetes diagnosis based on these clinical test data, involving a total of 35,669 individuals. The machine classification models built by these two algorithms have very good classification ability. Here, the LogitBoost classification model is slightly better than the Adaboost.M1 classification model. The overall accuracy of the LogitBoost classification model reached 95.30% when using 10-fold cross validation. The true positive, true negative, false positive, and false negative rates of the binary classification model were 0.921, 0.969, 0.031, and 0.079, respectively, and the area under the receiver operating characteristic curve reached 0.99.
Conclusions: The boosting algorithms show excellent performance for the diabetes classification models based on clinical medical data. The coefficient matrix of the original data is a sparse matrix, because some of the test results were missing, including some that were directly related to disease diagnosis. Therefore, the model is robust and has a degree of pre-diagnosis function. In the process of selecting the preferred test items, the most statistically significant discriminating factors between the diabetic and general populations were obtained and can be used as reference risk factors for diabetes mellitus.
Keywords: Boosting algorithms; Computer-aided diagnoses; Diabetes.
Conflict of interest statement
Ethics approval and consent to participate
All of the procedures performed in this study were approved by the ethics committee of the First Affiliated Hospital of WenZhou Medical University (Institutional review board approval No. 2017–126). Ethical approval for all of the procedures followed were in accordance with the Declaration of Helsinki of 1964 and its later versions. Due to the retrospective nature of the study, informed consent was waived by the IRB of the First Affiliated Hospital of WenZhou Medical University. Rong Jin granted permission in the Ethics approval and consent to participate section.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures
 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                References
- 
    - Mann DM, Bertoni AG, Shimbo D, Carnethon MR, Chen H, Jenny NS, Muntner P. Comparative validity of 3 diabetes mellitus risk prediction scoring models in a multiethnic US cohort: the multi-ethnic study of atherosclerosis. Am J Epidemiol. 2010;171(9):980–988. doi: 10.1093/aje/kwq030. - DOI - PMC - PubMed
 
- 
    - You S. Embracing medical innovation in the era of big data. Zhonghua Wei Chang Wai Ke Za Zhi. 2015;18(1):1–5. - PubMed
 
MeSH terms
LinkOut - more resources
- Full Text Sources
- Other Literature Sources
- Medical
 
        