Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 5;8(1):188.
doi: 10.1038/s41698-024-00695-7.

A multimodal neural network with gradient blending improves predictions of survival and metastasis in sarcoma

Affiliations

A multimodal neural network with gradient blending improves predictions of survival and metastasis in sarcoma

Anthony Bozzo et al. NPJ Precis Oncol. .

Abstract

The objective of this study is to develop a multimodal neural network (MMNN) model that analyzes clinical variables and MRI images of a soft tissue sarcoma (STS) patient, to predict overall survival and risk of distant metastases. We compare the performance of this MMNN to models based on clinical variables alone, radiomics models, and an unimodal neural network. We include patients aged 18 or older with biopsy-proven STS who underwent primary resection between January 1st, 2005, and December 31st, 2020 with complete outcome data and a pre-treatment MRI with both a T1 post-contrast sequence and a T2 fat-sat sequence available. A total of 9380 MRI slices containing sarcomas from 287 patients are available. Our MMNN accepts the entire 3D sarcoma volume from T1 and T2 MRIs and clinical variables. Gradient blending allows the clinical and image sub-networks to optimally converge without overfitting. Heat maps were generated to visualize the salient image features. Our MMNN outperformed all other models in predicting overall survival and the risk of distant metastases. The C-Index of our MMNN for overall survival is 0.77 and the C-Index for risk of distant metastases is 0.70. The provided heat maps demonstrate areas of sarcomas deemed most salient for predictions. Our multimodal neural network with gradient blending improves predictions of overall survival and risk of distant metastases in patients with soft tissue sarcoma. Future work enabling accurate subtype-specific predictions will likely utilize similar end-to-end multimodal neural network architecture and require prospective curation of high-quality data, the inclusion of genomic data, and the involvement of multiple centers through federated learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Gradient Blending.
This diagram indicates the relative weighting of modality losses that are contributed to the overall loss over the course of training. The relative weightings are initially uniform and are adjusted every 5 epochs according to their relative overfitting to generalization ratios.
Fig. 2
Fig. 2
Smoothed ROC curves and calibration plots for our MMNN in predicting overall survival and the risk of distant metastases.
Fig. 3
Fig. 3. Heat maps using the GradCAM method on our test set.
Representative T2 fat sat axial slices of four test set patients in our study (patients which were never encountered during model training) are displayed. The corresponding heat map from the same patient was pulled from the image subnetwork of the MMNN model. The merged image is provided. In all cases, the model deemed pixels within the tumor volume as most relevant. a Patient predicted to have very low risk of death and metastases, survived with no development of metastases over a 10-year follow-up period. (Low predicted risk, model correct). b Patient predicted to have high risk of death and metastases, perished shortly after developing metastases 1.2 years after surgery. (High predicted risk, model correct). c Patient predicted to have high risk of metastases, did not develop metastases in 3.8 years of follow-up (High predicted risk, model wrong). d Among patients who developed metastases in our test set, this patient had the lowest predicted risk. The model was correct in all other predictions indicating a lower risk of distant metastases. (Low-intermediate predicted risk, model incorrect since patient developed metastases two years after surgery).
Fig. 4
Fig. 4. Architecture of our multimodal neural network model.
A deep neural network (A) will interpret the 11 clinical variables and a 2-channel convolutional neural network (DenseNet-121) analyzes the MRI input (B). Image features from T1 and T2 MRI sequences are extracted by the convolutional neural network and this information is concatenated along with the features extracted from the clinical variables. Analysis of the combined feature set is used to predict the risk of distant metastases and overall survival. Gradient blending is used to moderate the weight updates between modalities. Dashed lines are used to indicate connections that are only present during training to facilitate Gradient Blending. 1A: Clinical Subnetwork Model. A deep neural network is implemented to extract features from a vector of clinical variables corresponding to the patient. Numbers under the linear layers correspond to the number of output features for those linear layers. The clinical model extracts 12 features that will be used for the multimodal prediction. 1B: Image Subnetwork Model. T1 post contrast and T2 fat-sat MRI sequences are concatenated along the channel dimension prior to being fed through a 2-channel DenseNet-121 model. Twelve features are extracted for use in the multimodal prediction. The numbers in each dense block correspond to the number of dense layers within that dense block. The architecture presented is representative of a 3-dimensional, 2-channel densenet-121 with 12 output neurons. Because the model is being used as a feature extractor rather than a classifier, the size of the output layer is a tunable parameter and not limited to the number of predictions made by the multimodal output head. 1C: Dense Block– Dense blocks consists of a series of dense layers. Within each dense block, the resolution of the feature map is constant. This allows all dense layers within a dense block to contain feed-forward bypass connections to every other dense layer in that dense block. These features are concatenated at the input of each dense layer. Transition layers are placed between dense blocks. Transition layers use 1x1x1 convolutions to act as channel pooling layers, reducing the number of feature maps by a factor of 2. In addition, stride 2 average pooling layers are used which reduce the resolution in all spatial dimensions by a factor of 2.

Similar articles

Cited by

References

    1. Gamboa, A. C., Gronchi, A. & Cardona, K. Soft‐tissue sarcoma in adults: An update on the current state of histiotype‐specific management in an era of personalized medicine. CA: Cancer J. Clin.70, 200–229 (2020). - PubMed
    1. Gronchi, A. et al. Histotype-tailored neoadjuvant chemotherapy versus standard chemotherapy in patients with high-risk soft-tissue sarcomas (ISG-STS 1001): an international, open-label, randomised, controlled, phase 3, multicentre trial. Lancet Oncol.18, 812–822 (2017). 10.1016/S1470-2045(17)30334-0 - DOI - PubMed
    1. Weitz, J. R., Antonescu, C. R. & Brennan, M. F. Localized extremity soft tissue sarcoma: improved knowledge with unchanged survival over time. J. Clin. Oncol.21, 2719–2725 (2003). 10.1200/JCO.2003.02.026 - DOI - PubMed
    1. Schneider, P. & Ghert, M. Surveillance AFter Extremity Tumor surgerY (SAFETY): A Protocol for an International Randomized Controlled Trial. (2018).
    1. Wilson, D. A. et al. Designing a rational follow-up schedule for patients with extremity soft tissue sarcoma. Ann. Surg. Oncol.27, 2033–2041 (2020). 10.1245/s10434-020-08240-z - DOI - PubMed

LinkOut - more resources