Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 26:89:103498.
doi: 10.1016/j.eclinm.2025.103498. eCollection 2025 Nov.

End-to-end deep learning model for the diagnosis and segmentation of primary retroperitoneal neoplasm: a multicenter cohort study

Affiliations

End-to-end deep learning model for the diagnosis and segmentation of primary retroperitoneal neoplasm: a multicenter cohort study

Xiang Feng et al. EClinicalMedicine. .

Abstract

Background: Primary retroperitoneal neoplasms (PRNs) are a diverse group of tumors that pose significant diagnostic challenges. Currently, no multicenter-validated diagnostic model exists for multiple PRN types based on computed tomography (CT) images. This study aimed to develop and validate an end-to-end deep learning model, REMIND (REtroperitoneal neoplasMs artificial-INtelligence Diagnosis), for the accurate diagnosis and segmentation of PRNs using enhanced CT images.

Methods: Patients from 12 Chinese centers between January 2012 and June 2024 were involved in this study. The dataset comprised patients with histologically confirmed PRNs, including seven types of PRNs: dedifferentiated liposarcoma, well-differentiated liposarcoma, leiomyosarcoma, ganglioneuroma, lymphoma, schwannoma, and paraganglioma. The REMIND model was trained using retrospective data from a single hospital in China (n = 606; five-fold cross validation), and externally validated using retrospectively collected data from 11 different hospitals (n = 736) and prospectively validated using prospectively collected data from the same hospital as the training set (n = 188) enrolled from January 2024 to June 2024. Additionally, a reader study involving 30 radiologists from 11 hospitals in China was conducted to assess REMIND's clinical utility.

Findings: REMIND demonstrated high predictive accuracy across different cohorts. For classifying neoplasm types, ROC curves showed AUCs over 0.80 for most types. For 7-way classification task, REMIND achieved top-1 accuracies of 0.66 (95%CI 0.61-0.69), 0.61 (95%CI 0.46-0.73), 0.63 (95%CI 0.54-0.69), and top-2 accuracies of 0.82 (95%CI 0.79-0.85), 0.79 (95%CI 0.77-0.83), 0.77 (95%CI 0.71-0.83) in the training, external validation, and prospective validation cohorts, respectively. For segmentation tasks, REMIND achieved average Dice scores of 0.75 (95%CI 0.73-0.76), 0.72 (95%CI 0.70-0.74), and 0.73 (95%CI 0.70-0.77) in training, external validation, and prospective validation cohorts. The reader study indicated the top-1 classification accuracy of REMIND was higher than junior radiologists (64.0% vs. 42.6%, p = 0.006), and attending radiologists (64.0% vs. 57.4%, p = 0.009) and equivalent to senior radiologists (64.0% vs. 64.3%, p = 0.905). When assisted by REMIND, the diagnostic accuracy of junior and attending radiologists significantly improved. Meanwhile, REMNID reduced interpretation time and increased diagnostic certainty in all groups of radiologists.

Interpretation: REMIND represents a first-in-class model for the diagnosis and segmentation of PRNs. Its integration into clinical practice has the potential to enhance diagnostic accuracy, increase predictive certainty, and reduce interpretation time. This study highlights the clinical applicability of AI in improving the diagnostic accuracy and reducing the workload for radiologists handling these rare and complex tumors.

Funding: This study was supported by the National Natural Science Foundation of China (82272905 and 82473385).

Keywords: Computed tomography; Deep-learning; Diagnosis; Liposarcoma; Primary retroperitoneal neoplasms; Segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The study design of this study. The design of this study. The flowchart of the data collection, model establishment and clinical evaluation of REMIND was presented. REMIND: Retroperitoneal neoplasMs artificial-INtelligence Diagnosis.
Fig. 2
Fig. 2
The inclusion and exclusion of patients in the training cohort, external validation cohort and prospective validation cohort. (A) The training cohort. (B) The external validation cohort. (C). The Prospective validation cohort. PRN, Primary retroperitoneal neoplasms. REMIND, Retroperitoneal neoplasMs artificial-INtelligence Diagnosis.
Fig. 3
Fig. 3
Classification performance of REMIND. (A–C) The confusion matrix on training cohorts, the external validation cohort, and the prospective validation cohorts. (D–F) The ROC curve for 7-way classification within the training cohort, the external validation cohort, and the prospective validation cohort. (G–I) The ROC curve for benign and malignant binary classification within the training cohort, the external validation cohort, and the prospective validation cohort. (J) Slice attention map of a patient with primary retroperitoneal neoplasm is shown as an example. REMIND classifies the images as DDLPS in the majority slices and WDLPS in some slices in the middle. The final output of the model was DDLPS, and it was concordant with pathological findings: DDLPS with some WDLPS content. ROC = receiver operating characteristic curve. PGL = Paraganglioma. GN = Ganglioneuroma. LYM = Lymphoma. LMS = Leiomyosarcoma. SWN = Schwannoma. WDLPS = Well-differentiated Liposarcoma. DDLPS = Dedifferentiated Liposarcoma.
Fig. 4
Fig. 4
REMIND's segmentation performance. Dice scores reflecting the model's performance on neoplasm segmentation. (A) Dice scores across different pathological types in various cohorts. (B) Dice scores across different tumor sizes in various cohorts. (C) Examples of radiologists and REMIND segmentation results by different pathological types. PGL = Paraganglioma. GN = Ganglioneuroma. LYM = Lymphoma. LMS = Leiomyosarcoma. SWN = Schwannoma. WDLPS = Well-differentiated Liposarcoma. DDLPS = Dedifferentiated Liposarcoma. The statistical method used in (B) is ANOVA.
Fig. 5
Fig. 5
Reader study in 100 primary retroperitoneal neoplasm cases. (A) Design of the Reader study. The top-1 (B) and top-2 (C) accuracy of radiologists with different qualifications with or without REMIND’s assistance for each pathological type. (D) The top-1 and top-2 accuracy of every radiologist with or without REMIND’s assistance. Junior: radiologist 1–10, attending: radiologist 11–20, and senior radiologists: 21–30. REMIND helped increase the prediction of top-2 accuracy in all junior radiologists, 8 out of 10 attending radiologists, and 8 out of 10 senior radiologists. (E) The comparison of the certainty of diagnostic determination by radiologists with or without REMIND’s assistance. (F) The interpretation time to complete the diagnosis of 100 cases by radiologists with or without REMIND’s assistance. (The (A) was created with Biorender). REMIND = REtroperitoneal neoplasMs artificial-INtelligence Diagnosis. PGL = Paraganglioma. GN = Ganglioneuroma. LYM = Lymphoma. LMS = Leiomyosarcoma. SWN = Schwannoma. WDLPS = Well-differentiated Liposarcoma. DDLPS = Dedifferentiated Liposarcoma. The statistical method used in Figure is one-tailed Wilcoxon signed-rank test with continuity correction.
Fig. 6
Fig. 6
The visual analytics of REMIND and the comparison of diagnosis by the radiologists with and without REMIND. These cases were examples from reader study. The first column showed original contrast-enhanced CT images. The second column were pie charts of diagnostic results of 30 radiologists without the aid of REMIND. The pie charts displayed the composition of the diagnostic results provided by the 30 radiologists for each case and their corresponding proportions. Different colors in the pie chart represent different tumor types. The percentage in the pie chart indicated the frequency of relevant tumor type diagnosed by radiologists, calculated as the times of diagnoses divided by the total number of radiologists, which is 30. The third column presented REMIND’s top-1 and top-2 prediction as well as the corresponding confidence score. The fourth column showed the composition of diagnostic results given by 30 radiologists with the aid of REMIND. The correct diagnosis of each case was (A) Paraganglioma (PGL), (B) Dedifferentiated Liposarcoma (DDLPS), (C) Leiomyosarcoma (LMS), (D) Lymphoma (LYM), (E) Schwannoma (SWN), (F) Ganglioneuroma (GN), (G) Well-differentiated Liposarcoma (WDLPS). REMIND = REtroperitoneal neoplasMs artificial-INtelligence Diagnosis.

References

    1. Sangster G.P., Migliaro M., Heldmann M.G., Bhargava P., Hamidian A., Thomas-Ogunniyi J. The gamut of primary retroperitoneal masses: multimodality evaluation with pathologic correlation. Abdom Radiol. 2016;41:1411–1430. - PubMed
    1. Osman S., Lehnert B.E., Elojeimy S., et al. A comprehensive review of the retroperitoneal anatomy, neoplasms, and pattern of disease spread. Curr Probl Diagn Radiol. 2013;42:191–208. - PubMed
    1. Yamagata Y., Komiyama M., Iwata S. Clinical characteristics and management of primary retroperitoneal sarcoma: a literature review. Ann Gastroenterol Surg. 2024;8:21–29. - PMC - PubMed
    1. Villano A.M., Zeymo A., Chan K.S., Unger K.R., Shara N., Al-Refaie W.B. Variations in retroperitoneal soft tissue sarcoma outcomes by hospital type: a national cancer database analysis. JCO Oncol Pract. 2020;16:e991–e1003. - PubMed
    1. Bonvalot S., Gronchi A., Le Péchoux C., et al. Preoperative radiotherapy plus surgery versus surgery alone for patients with primary retroperitoneal sarcoma (EORTC-62092: STRASS): a multicentre, open-label, randomised, phase 3 trial. Lancet Oncol. 2020;21:1366–1377. - PubMed