Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 1;112(1):1153-1163.
doi: 10.1097/JS9.0000000000003494. Epub 2025 Sep 17.

Multimodal deep learning integration for predicting renal function outcomes in living donor kidney transplantation: a retrospective cohort study

Affiliations

Multimodal deep learning integration for predicting renal function outcomes in living donor kidney transplantation: a retrospective cohort study

Jin-Myung Kim et al. Int J Surg. .

Abstract

Background: Accurately predicting post-transplant renal function is essential for optimizing donor-recipient matching and improving long-term outcomes in kidney transplantation (KT). Traditional models using only structured clinical data often fail to account for complex biological and anatomical factors. This study aimed to develop and validate a multimodal deep learning model that integrates computed tomography (CT) imaging, radiology report text, and structured clinical variables to predict 1-year estimated glomerular filtration rate (eGFR) in living donor kidney transplantation (LDKT) recipients.

Materials and methods: A retrospective cohort of 1,937 LDKT recipients was selected from 3772 KT cases. Exclusions included deceased donor KT, immunologic high-risk recipients ( n = 304), missing CT imaging, early graft complications, and anatomical abnormalities. eGFR at 1 year post-transplant was classified into four categories: >90, 75-90, 60-75, and 45-60 mL/min/1.73 m 2 . Radiology reports were embedded using BioBERT, while CT videos were encoded using a CLIP-based visual extractor. These were fused with structured clinical features and input into ensemble classifiers including XGBoost. Model performance was evaluated using cross-validation and SHapley Additive exPlanations (SHAP) analysis.

Results: The full multimodal model achieved a macro F1 score of 0.675, micro F1 score of 0.704, and weighted F1 score of 0.698 - substantially outperforming the clinical-only model (macro F1 = 0.292). CT imaging contributed more than text data (clinical + CT macro F1 = 0.651; clinical + text = 0.486). The model showed highest accuracy in the >90 (F1 = 0.7773) and 60-75 (F1 = 0.7303) categories. SHAP analysis identified donor age, BMI, and donor sex as key predictors. Dimensionality reduction confirmed internal feature validity.

Conclusion: Multimodal deep learning integrating clinical, imaging, and textual data enhances prediction of post-transplant renal function. This framework offers a robust and interpretable approach for individualized risk stratification in LDKT, supporting precision medicine in transplantation.

Keywords: kidney transplant; machine learning; multimodal deep learning; post-transplant outcome prediction; prognosis.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Study cohort flow diagram for model development and analysis.
Figure 2.
Figure 2.
Overview of the proposed multimodal classification framework. The model integrates structured clinical variables, radiology report text, and CT video data. Clinical information is processed as tabular features; radiology text is embedded using a pretrained BERT model; and CT video frames are encoded via a CLIP-based image encoder. Each modality undergoes feature extraction and mean pooling, followed by feature concatenation and min–max normalization. The final fused representation is input into an ML classifier (xGBoost/RF/SVM/LR) to predict the post-transplantation eGFR category.
Figure 3.
Figure 3.
Visualization of multimodal feature embeddings using two-dimensional projection techniques. The fused embeddings obtained from structured, textual, and visual features were reduced to two dimensions using t-distributed stochastic neighbor embedding (t-SNE) (A) and uniform manifold approximation and projection (UMAP) (B) Each point represents a patient sample, color-coded by the corresponding post-transplantation eGFR class (>90, 75–90, 60–75, or 45–60). While both methods reveal local grouping tendencies, substantial overlap among classes is observed. T-SNE emphasizes local clusters within classes, whereas UMAP preserves more of the global structure, enabling visualization of broader inter-class relationships.
Figure 4.
Figure 4.
Multiclass classification performance of final model using XGBoost. Confusion matrix summarizing the performance of the final multimodal model trained with XGBoost across four post-transplant eGFR categories: > 90, 75–90, 60–75, and 45–60 mL/min/1.73 m2. Precision, recall, and F1-scores are reported for each class. The model achieved the highest precision and recall for the >90 and 60–75 categories, with lower performance for the 45–60 group due to its smaller sample size. Overall macro and weighted F1-scores were 0.675 and 0.698, respectively, indicating strong class-balanced performance. Most misclassifications occurred between adjacent eGFR strata, reflecting clinical overlaps in renal function.
Figure 5.
Figure 5.
Mean absolute SHAP values for all features, color-coded by class. Donor age (Dage), BMI, and donor sex (Dsex) were consistently influential across all classes. This multi-class SHAP summary reveals both shared and class-specific key predictors.
Figure 6.
Figure 6.
Class-specific SHAP summary plots for predicting each eGFR category: (A) > 90, (B) 75–90, (C) 60–75, and (D) 45–60. Each plot displays the top contributing features for the corresponding class, with SHAP values representing the magnitude and direction of each feature’s influence on the model output. While certain features such as Dage, BMI, and D_cr consistently appear across classes, other features demonstrate class-specific importance. For example, Dsex and AGE were more predictive of Class (A), whereas DM and DBMI were prominent in Class (B), and Induction appeared more influential in Class (D). These findings highlight both shared and distinct patterns in multimodal predictors of kidney function.

References

    1. Yoo KD, Kim CT, Kim MH, et al. Superior outcomes of kidney transplantation compared with dialysis: an optimal matched analysis of a national population-based cohort study between 2005 and 2008 in Korea. Medicine (Baltimore) 2016;95:e4352. - PMC - PubMed
    1. Wolfe RA, Ashby VB, Milford EL, et al. Comparison of mortality in all patients on dialysis, patients on dialysis awaiting transplantation, and recipients of a first cadaveric transplant. N Engl J Med 1999;341:1725–30. - PubMed
    1. Tonelli M, Wiebe N, Knoll G, et al. Systematic review: kidney transplantation compared with dialysis in clinically relevant outcomes. Am J Transplant 2011;11:2093–109. - PubMed
    1. Strohmaier S, Wallisch C, Kammer M, et al. Survival benefit of first single-organ deceased donor kidney transplantation compared with long-term dialysis across ages in transplant-eligible patients with kidney failure. JAMA Network Open 2022;5:e2234971. - PMC - PubMed
    1. Sarhan AL, Jarareh RH, Shraim M. Quality of life for kidney transplant recipients and hemodialysis patients in Palestine: a cross-sectional study. BMC Nephrol 2021;22:210. - PMC - PubMed