Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study
- PMID: 39413217
- Bookshelf ID: NBK608149
- DOI: 10.3310/KPWQ4208
Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study
Excerpt
Background: Whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. Machine learning may support radiologists reading whole-body magnetic resonance imaging.
Objectives:
To develop a machine-learning algorithm to detect normal organs and cancer lesions.
To compare diagnostic accuracy, time and agreement of radiology reads to detect metastases using whole-body magnetic resonance imaging with concurrent machine learning (whole-body magnetic resonance imaging + machine learning) against standard whole-body magnetic resonance imaging (whole-body magnetic resonance imaging + standard deviation).
Design and participants: Retrospective analysis of (1) prospective single-centre study in healthy volunteers > 18 years (n = 51) and (2) prospective multicentre STREAMLINE study patient data (n = 438).
Tests: Index: whole-body magnetic resonance imaging + machine learning.
Comparator: whole-body magnetic resonance imaging + standard deviation.
Reference standard: Previously established expert panel consensus reference at 12 months from diagnosis.
Outcome measures: Primary: difference in per-patient specificity between whole-body magnetic resonance imaging + machine learning and whole-body magnetic resonance imaging + standard deviation. Secondary: per-patient sensitivity, per-lesion sensitivity and specificity, read time and agreement.
Methods: Phase 1: classification forests, convolutional neural networks, and a multi-atlas approaches for organ segmentation.
Phase 2/3: whole-body magnetic resonance imaging scans were allocated to Phase 2 (training = 226, validation = 45) and Phase 3 (testing = 193). Disease sites were manually labelled.
The final algorithm was applied to 193 Phase 3 cases, generating probability heatmaps. Twenty-five radiologists (18 experienced, 7 inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging + machine learning or whole-body magnetic resonance imaging + standard deviation over two or three rounds in a National Health Service setting. Read time was independently recorded.
Results: Phases 1 and 2: convolutional neural network had best Dice similarity coefficient, recall and precision measurements for healthy organ segmentation. Final algorithm used a ‘two-stage’ initial organ identification followed by lesion detection.
Phase 3: evaluable scans (188/193, of which 50 had metastases from 117 colon, 71 lung cancer cases) were read between November 2019 and March 2020. For experienced readers, per-patient specificity for detection of metastases was 86.2% (whole-body magnetic resonance imaging + machine learning) and 87.7% (whole-body magnetic resonance imaging + standard deviation), (difference −1.5%, 95% confidence interval −6.4% to 3.5%; p = 0.387); per-patient sensitivity was 66.0% (whole-body magnetic resonance imaging + machine learning) and 70.0% (whole-body magnetic resonance imaging + standard deviation) (difference −4.0%, 95% confidence interval −13.5% to 5.5%; p = 0.344). For inexperienced readers (53 reads, 15 with metastases), per-patient specificity was 76.3% in both groups with sensitivities of 73.3% (whole-body magnetic resonance imaging + machine learning) and 60.0% (whole-body magnetic resonance imaging + standard deviation). Per-site specificity remained high within all sites; above 95% (experienced) or 90% (inexperienced). Per-site sensitivity was highly variable due to low number of lesions in each site.
Reading time lowered under machine learning by 6.2% (95% confidence interval −22.8% to 10.0%). Read time was primarily influenced by read round with round 2 read times reduced by 32% (95% confidence interval 20.8% to 42.8%) overall with subsequent regression analysis showing a significant effect (p = 0.0281) by using machine learning in round 2 estimated as 286 seconds (or 11%) quicker.
Interobserver variance for experienced readers suggests moderate agreement, Cohen’s κ = 0.64, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + machine learning) and Cohen’s κ = 0.66, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + standard deviation).
Limitations: Patient whole-body magnetic resonance imaging data were heterogeneous with relatively few metastatic lesions in a wide variety of locations, making training and testing difficult and hampering evaluation of sensitivity.
Conclusions: There was no difference in diagnostic accuracy for whole-body magnetic resonance imaging radiology reads with or without machine-learning support, although radiology read time may be slightly shortened using whole-body magnetic resonance imaging + machine learning.
Future work: Failure-case analysis to improve model training, automate lesion segmentation and transfer of machine-learning techniques to other tumour types and imaging modalities.
Study registration: This study is registered as ISRCTN23068310.
Funding: This award was funded by the National Institute for Health and Care Research (NIHR) Efficacy and Mechanism Evaluation (EME) programme (NIHR award ref: 13/122/01) and is published in full in Efficacy and Mechanism Evaluation; Vol. 11, No. 15. See the NIHR Funding and Awards website for further award information.
Plain language summary
Whole-body magnetic resonance imaging demonstrates the entire body and can detect the spread of tumour, without the burden of ionising radiation. Recently, the STREAMLINE study reported that whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. However, whole-body magnetic resonance imaging is complex to report.
Machine learning is a type of artificial intelligence whereby a computer learns from being given previous data to undertake a task, using techniques such as classification forests, convolutional neural networks, and multi-atlas approaches. Our aim was to develop a machine-learning method to automatically detect lesions on whole-body magnetic resonance imaging to support radiologists by potentially improving their ability to correctly detect disease and reduce the reading time of whole-body magnetic resonance imaging scans in patients with cancer.
Firstly, whole-body magnetic resonance imaging scans from 51 healthy volunteers were used to develop machine-learning methods to automatically detect normal organs.
Secondly, machine-learning methods were trained to detect cancer lesions, using 271 whole-body magnetic resonance imaging scans from a previous study.
Finally, the refined machine-learning technique was tested in 188 patient scans from a previous study, to see if the technique could improve radiology reporting by increasing accuracy and speed in detecting disease. We designed a system to test the accuracy of radiologists reading whole-body magnetic resonance imaging with or without machine-learning support in a near-real clinical National Health Service setting. Twenty-five independent radiologists (18 experienced in reading whole-body magnetic resonance imaging and 7 radiologists inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging scans to read with or without machine-learning support. We found that machine-learning support resulted in similar accuracy for detecting disease and was slightly more efficient in the reading time than for radiological reads without machine-learning support. Differences in interpretation between experienced readers were considered moderate in both cases.
Overall, the study was an ambitious attempt to undertake a highly complex machine-learning task, to detect cancer on whole-body magnetic resonance imaging. Many important steps have been taken but the current machine-learning algorithm did not result in a significant improvement in the radiologist’s accuracy for disease detection, although it may have slightly reduced the time taken to read the study. Future work is advocated to further develop machine-learning tools to improve the accuracy of tumour detection.
Copyright © 2024 Henriksen et al.
Sections
- Scientific summary
- Chapter 1. Introduction
- Chapter 2. Phase 1: healthy volunteer data collection and pre-processing fat-water swap artefact
- Chapter 3. Phase 1: fully automatic, multi-organ segmentation in normal whole-body magnetic resonance imaging, using classification forests, convolutional neural networks and a multi-atlas approach
- Chapter 4. Reverse classification accuracy and domain adaptation
- Chapter 5. Developing machine-learning method for clinical whole-body magnetic resonance imaging study: Phase 2 training and validation methods and model selection
- Chapter 6. Machine-learning clinical validation: Phase 3 methods and results for performance evaluations
- Chapter 7. Discussion
- Chapter 8. Implications for practice and future research
- Additional information
- References
- Appendix 1. Supplementary tables and figures
- Appendix 2. Using ITK-SNAP for checking segmentation
- Appendix 3. Phase 2 segmentation checking methods
- Appendix 4. User manual for using Biotronics platform
- Appendix 5. MALIBO STC
- Appendix 6. MALIBO STL
- Appendix 7. Statistical analysis plan for machine learning in whole-body oncology project (version 1.1; 24 January 2020)
- List of abbreviations
- List of supplementary material
Publication types
LinkOut - more resources
Full Text Sources