Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 18;30(1):183.
doi: 10.1186/s40001-025-02442-5.

Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms

Affiliations

Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms

Xiaohua Wan et al. Eur J Med Res. .

Abstract

Objectives: This study aimed to identify risk factors for diabetic retinopathy (DR) and develop machine learning (ML)-based predictive models using routine laboratory data in patients with type 2 diabetes mellitus (T2DM).

Methods: Clinical data from 4259 T2DM inpatients at Beijing Tongren Hospital were analyzed, divided into a model construction data set (N = 3936) and an external validation data set (N = 323). Using 39 optimal variables, a prediction model was constructed using the eXtreme Gradient Boosting (XGBoost) algorithm and compared with four other algorithms: support vector machine (SVM), gradient boosting decision tree (GBDT), neural network (NN), and logistic regression (LR). The Shapley Additive exPlanation (SHAP) method was employed to interpret the XGBoost model. External validation was performed to assess model performance.

Results: DR was present in 47.69% (N = 1877) of T2DM patients in the model construction data set. Among the models tested, the XGBoost model performed best with an AUC of 0.831, accuracy of 0.757, sensitivity of 0.754, specificity of 0.759, and F1-score of 0.752. SHAP explained feature importance for XGBoost model and identified key risk factors for DR. External validation yielded an accuracy of 0.650 for the XGBoost model.

Conclusions: The XGBoost-based prediction model effectively assesses DR risk in T2DM patients using routine laboratory data, aiding clinicians in identifying high-risk individuals and guiding personalized management strategies, especially in medically underserved areas.

Keywords: Diabetic retinopathy; Machine learning; Predictive model; Routine laboratory tests; Type 2 diabetes mellitus; XGBoost.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study was approved by the ethics committee of Beijing Tongren Hospital, Capital Medical University (No. TREC2024-KY040). Human ethics and consent to participate: Every human participant agreed to participate in the study and signed an informed consent form. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of study design. The non-DR group were defined as patients without DR, and DR group were patients with DR. T2DM, type 2 diabetes mellitus; DR, diabetic retinopathy; ML, machine learning; XGBoost, eXtreme Gradient Boosting; SVM, support vector machine; GBDT, gradient boosting decision tree; NN, neural network; LR, logistic regression; SHAP, Shapley Additive exPlanation
Fig. 2
Fig. 2
Data distribution characteristics and correlation of variables in T2DM patients of non-DR and DR groups from the model construction data set. A Violin plots were employed to visualize the data distribution of the non-DR (blue area) and DR (yellow area) groups. The width of the plot indicates data density, with broader sections signifying higher density. The shape of the plot reveals the data distribution, offering insights into its spread and concentration. B Butterfly chart was used to display the normalized median values of different indicators for the non-DR (blue area) and DR (yellow area) groups. C Correlation between the variables was analyzed by a heatmap. Abbreviations as in Tables 1–4
Fig. 3
Fig. 3
Feature selection accuracy curve. The accuracy got the highest value when the number of variables was 39 (represented as a red solid point)
Fig. 4
Fig. 4
Feature importance ranking of diabetes retinopathy risk based on Random Forest algorithm in T2DM patients in the model construction data set. The longer the bar chart, the greater the impact of the variable on the prediction results, and the more valuable it is for decision-making reference. Abbreviations as in Tables 1–4
Fig. 5
Fig. 5
Receiver operating characteristic (ROC) curves of five algorithms for detecting diabetic retinopathy based on 39 important variables in T2DM patients from the model construction data set. Abbreviations as in Tables 1, 5, and 6
Fig. 6
Fig. 6
SHAP explained global feature importance for XGBoost model. A Bar chart of the mean absolute SHAP value for each predictor. The inset PieDonut contains categorized features (out ring) and single variable contributions (inner ring). B SHAP summary plot. The dot's color represents the magnitude of the feature value, with red denoting higher values and blue indicating lower values. Its horizontal position corresponds to the SHAP value, reflecting the direction and strength of the feature’s influence on the model's output. SHAP, Shapley Additive exPlanation; Abbreviations as in Tables 1–5
Fig. 7
Fig. 7
Two examples of the local explanation of the predictions using the Shapley Additive exPlanation (SHAP) values. A Predicted T2DM patient without diabetic retinopathy. B Predicted T2DM with diabetic retinopathy. Factors that push the predicted score higher compared to the base value (mean prediction) are colored red, and those pushing lower the prediction are shown in blue. SHAP, Shapley Additive exPlanation; Abbreviations as in Tables 1–5
Fig. 8
Fig. 8
SHAP dependence plots of top 10 important features in XGBoost model. A SHAP dependence plots of SBP; B SHAP dependence plots of BUN; C SHAP dependence plots of HbA1c; D SHAP dependence plots of ALB; E SHAP dependence plots of K; F SHAP dependence plots of ALT; G SHAP dependence plots of MCV; H SHAP dependence plots of FIB; I SHAP dependence plots of FBG; J SHAP dependence plots of LDH. The blue dots represent the eigenvalues and the SHAP values corresponding to each observation. The red line represents the SHAP values equal to zero. SHAP, Shapley additive explanation; Abbreviations as in Tables 1–5

Similar articles

Cited by

References

    1. Vision Loss Expert Group of the Global Burden of Disease Study; GBD. Blindness and Vision Impairment Collaborators (2024) Global estimates on the number of people blind or visually impaired by diabetic retinopathy: a meta-analysis from 2000 to 2020. Eye (Lond). 2019;38(11):2047–57. 10.1038/s41433-024-03101-5. - PMC - PubMed
    1. Hou X, Wang L, Zhu D, Guo L, Weng J, Zhang M, et al. Prevalence of diabetic retinopathy and vision-threatening diabetic retinopathy in adults with diabetes in China. Nat Commun. 2023;14(1):4296. 10.1038/s41467-023-39864-w. - PMC - PubMed
    1. Teo ZL, Tham YC, Yu M, Chee ML, Rim TH, Cheung N, et al. Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology. 2021;128(11):1580–91. 10.1016/j.ophtha.2021.04.027. - PubMed
    1. Chinese Diabetes Society. Guideline for the prevention and treatment of type 2 diabetes mellitus in China (2020 edition). Chin J Diabetes Mellit. 2021;13(4):315–409. 10.3760/cma.j.cn115791-20210221-00095.
    1. Fundus Disease Group of Ophthalmological Society of Chinese Medical Association, Fundus Disease Group of Ophthalmologist Branch of Chinese Medical Doctor Association. Evidence-based guidelines for diagnosis and treatment of diabetic retinopathy in China (2022). Chin J Ocul Fundus Dis. 2023;39(2):99–124. 10.3760/cma.j.cn511434-20230110-00018.

LinkOut - more resources