Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 1;481(11):2247-2256.
doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer

Affiliations

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer

Min Wook Joo et al. Clin Orthop Relat Res. .

Abstract

Background: Improvement in survival in patients with advanced cancer is accompanied by an increased probability of bone metastasis and related pathologic fractures (especially in the proximal femur). The few systems proposed and used to diagnose impending fractures owing to metastasis and to ultimately prevent future fractures have practical limitations; thus, novel screening tools are essential. A CT scan of the abdomen and pelvis is a standard modality for staging and follow-up in patients with cancer, and radiologic assessments of the proximal femur are possible with CT-based digitally reconstructed radiographs. Deep-learning models, such as convolutional neural networks (CNNs), may be able to predict pathologic fractures from digitally reconstructed radiographs, but to our knowledge, they have not been tested for this application.

Questions/purposes: (1) How accurate is a CNN model for predicting a pathologic fracture in a proximal femur with metastasis using digitally reconstructed radiographs of the abdomen and pelvis CT images in patients with advanced cancer? (2) Do CNN models perform better than clinicians with varying backgrounds and experience levels in predicting a pathologic fracture on abdomen and pelvis CT images without any knowledge of the patients' histories, except for metastasis in the proximal femur?

Methods: A total of 392 patients received radiation treatment of the proximal femur at three hospitals from January 2011 to December 2021. The patients had 2945 CT scans of the abdomen and pelvis for systemic evaluation and follow-up in relation to their primary cancer. In 33% of the CT scans (974), it was impossible to identify whether a pathologic fracture developed within 3 months after each CT image was acquired, and these were excluded. Finally, 1971 cases with a mean age of 59 ± 12 years were included in this study. Pathologic fractures developed within 3 months after CT in 3% (60 of 1971) of cases. A total of 47% (936 of 1971) were women. Sixty cases had an established pathologic fracture within 3 months after each CT scan, and another group of 1911 cases had no established pathologic fracture within 3 months after CT scan. The mean age of the cases in the former and latter groups was 64 ± 11 years and 59 ± 12 years, respectively, and 32% (19 of 60) and 53% (1016 of 1911) of cases, respectively, were female. Digitally reconstructed radiographs were generated with perspective projections of three-dimensional CT volumes onto two-dimensional planes. Then, 1557 images from one hospital were used for a training set. To verify that the deep-learning models could consistently operate even in hospitals with a different medical environment, 414 images from other hospitals were used for external validation. The number of images in the groups with and without a pathologic fracture within 3 months after each CT scan increased from 1911 to 22,932 and from 60 to 720, respectively, using data augmentation methods that are known to be an effective way to boost the performance of deep-learning models. Three CNNs (VGG16, ResNet50, and DenseNet121) were fine-tuned using digitally reconstructed radiographs. For performance measures, the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, precision, and F1 score were determined. The area under the receiver operating characteristic curve was used to evaluate three CNN models mainly, and the optimal accuracy, sensitivity, and specificity were calculated using the Youden J statistic. Accuracy refers to the proportion of fractures in the groups with and without a pathologic fracture within 3 months after each CT scan that were accurately predicted by the CNN model. Sensitivity and specificity represent the proportion of accurately predicted fractures among those with and without a pathologic fracture within 3 months after each CT scan, respectively. Precision is a measure of how few false-positives the model produces. The F1 score is a harmonic mean of sensitivity and precision, which have a tradeoff relationship. Gradient-weighted class activation mapping images were created to check whether the CNN model correctly focused on potential pathologic fracture regions. The CNN model with the best performance was compared with the performance of clinicians.

Results: DenseNet121 showed the best performance in identifying pathologic fractures; the area under the receiver operating characteristic curve for DenseNet121 was larger than those for VGG16 (0.77 ± 0.07 [95% CI 0.75 to 0.79] versus 0.71 ± 0.08 [95% CI 0.69 to 0.73]; p = 0.001) and ResNet50 (0.77 ± 0.07 [95% CI 0.75 to 0.79] versus 0.72 ± 0.09 [95% CI 0.69 to 0.74]; p = 0.001). Specifically, DenseNet121 scored the highest in sensitivity (0.22 ± 0.07 [95% CI 0.20 to 0.24]), precision (0.72 ± 0.19 [95% CI 0.67 to 0.77]), and F1 score (0.34 ± 0.10 [95% CI 0.31 to 0.37]), and it focused accurately on the region with the expected pathologic fracture. Further, DenseNet121 was less likely than clinicians to mispredict cases in which there was no pathologic fracture than cases in which there was a fracture; the performance of DenseNet121 was better than clinician performance in terms of specificity (0.98 ± 0.01 [95% CI 0.98 to 0.99] versus 0.86 ± 0.09 [95% CI 0.81 to 0.91]; p = 0.01), precision (0.72 ± 0.19 [95% CI 0.67 to 0.77] versus 0.11 ± 0.10 [95% CI 0.05 to 0.17]; p = 0.0001), and F1 score (0.34 ± 0.10 [95% CI 0.31 to 0.37] versus 0.17 ± 0.15 [95% CI 0.08 to 0.26]; p = 0.0001).

Conclusion: CNN models may be able to accurately predict impending pathologic fractures from digitally reconstructed radiographs of the abdomen and pelvis CT images that clinicians may not anticipate; this can assist medical, radiation, and orthopaedic oncologists clinically. To achieve better performance, ensemble-learning models using knowledge of the patients' histories should be developed and validated. The code for our model is publicly available online at https://github.com/taehoonko/CNN_path_fx_prediction .

Level of evidence: Level III, diagnostic study.

PubMed Disclaimer

Conflict of interest statement

The institution of one or more of the authors (MWJ) has received, during the study period, funding from the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2021R1F1A1047841). Each author certifies that there are no funding or commercial associations (consultancies, stock ownership, equity interest, patent/licensing arrangements, etc.) that might pose a conflict of interest in connection with the submitted article related to the author or any immediate family members. All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research ® editors and board members are on file with the publication and can be viewed on request.

Figures

Fig. 1
Fig. 1
This STROBE diagram demonstrates the study exclusion and inclusion criteria.
Fig. 2
Fig. 2
Digitally reconstructed radiographs are generated from abdomen and pelvis CT images, cropped to contain parts for each femur, and then augmented: (A) shows the virtual radiographic source, (B) shows CT images, (C) shows a digitally reconstructed radiograph, (D) shows two cropped digitally reconstructed radiographs showing the individual femurs, and (E) shows image augmentation guided by six randomly generated areas of interest.
Fig. 3
Fig. 3
The framework of DenseNet121, one of the convolutional neural networks used in this study, is presented. A color image accompanies the online version of this article.
Fig. 4
Fig. 4
Receiver operating characteristics curves of the three convolutional neural network models were used to evaluate the performance of (A) VGG16, (B) ResNet50, and (C) DenseNet121.
Fig. 5
Fig. 5
Heatmaps from gradient-weighted class activation mapping demonstrate the corresponding parts of the cropped digitally reconstructed radiographs that determined the prediction performances of DenseNet121 for three positive examples: (A) an 83-year-old man with non–small cell lung cancer, (B) a 54-year-old man with renal cell carcinoma, and (C) a 49-year-old man with rectal cancer.

Comment in

Similar articles

References

    1. Alzubaidi L, Zhang J, Humaidi AJ, et al. . Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8:53. - PMC - PubMed
    1. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, Haworth A. A review of medical image data. J Med Imaging Radiat Oncol. 2021;65:545-563. - PubMed
    1. Choi ES, Han I, Cho HS, Park IW, Park JW, Kim HS. Intramedullary nailing for pathological fractures of the proximal humerus. Clin Orthop Surg. 2016;8:458-464. - PMC - PubMed
    1. Damron TA, Morgan H, Prakash D, Grant W, Aronowitz J, Heiner J. Critical evaluation of Mirels' rating system for impending pathologic fractures. Clin Orthop Relat Res. 2003;415(suppl):S201-S207. - PubMed
    1. Fuller RM, Kim J, An TW, et al. Assessment of flatfoot deformity using digitally reconstructed radiographs: reliability and comparison to conventional radiographs. Foot Ankle Int. 2022;43:983-993. - PubMed