Review

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Andrea Rockall^{1

2}, Xingfeng Li¹, Nicholas Johnson³, Ioannis Lavdas¹, Shalini Santhakumaran^{3

4}, A Toby Prevost³, Dow-Mu Koh⁵, Shonit Punwani⁶, Vicky Goh⁷, Nishat Bharwani^{1

2}, Amandeep Sandhu², Harbir Sidhu^{6

8}, Andrew Plumb⁶, James Burn², Aisling Fagan², Alf Oliver⁶, Georg J Wengert^{1

9}, Daniel Rueckert¹⁰, Eric Aboagye¹, Stuart A Taylor^{6

8}, Ben Glocker¹⁰; The MALIBO Investigators

Southampton (UK): National Institute for Health and Care Research; 2024 Oct.

Efficacy and Mechanism Evaluation.

Affiliations

¹ Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
² Imaging Department, Imperial College Healthcare NHS Trust, London, UK
³ Nightingale-Saunders Clinical Trials and Epidemiology Unit, King’s College London, London, UK
⁴ King’s Cancer Prevention Group, School of Cancer and Pharmaceutical Sciences, King’s College, London, UK
⁵ Royal Marsden Hospital and The Institute of Cancer Research, Sutton, UK
⁶ Centre for Medical Imaging, University College London, London, UK
⁷ Cancer Imaging, School of Biomedical Engineering and Imaging Sciences, King’s College London and Department of Radiology, Guy’s and St Thomas’ Hospitals NHS Foundation Trust, London, UK
⁸ Department of Radiology, University College London Hospital, London, UK
⁹ Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Vienna General Hospital, Vienna, Austria
¹⁰ Faculty of Engineering, Department of Computing, Imperial College London, London, UK

PMID: 39413217
Bookshelf ID: NBK608149
DOI: 10.3310/KPWQ4208

Free Books & Documents

Review

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Andrea Rockall et al.

Free Books & Documents

Southampton (UK): National Institute for Health and Care Research; 2024 Oct.

Efficacy and Mechanism Evaluation.

Authors

Affiliations

¹ Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
² Imaging Department, Imperial College Healthcare NHS Trust, London, UK
³ Nightingale-Saunders Clinical Trials and Epidemiology Unit, King’s College London, London, UK
⁴ King’s Cancer Prevention Group, School of Cancer and Pharmaceutical Sciences, King’s College, London, UK
⁵ Royal Marsden Hospital and The Institute of Cancer Research, Sutton, UK
⁶ Centre for Medical Imaging, University College London, London, UK
⁷ Cancer Imaging, School of Biomedical Engineering and Imaging Sciences, King’s College London and Department of Radiology, Guy’s and St Thomas’ Hospitals NHS Foundation Trust, London, UK
⁸ Department of Radiology, University College London Hospital, London, UK
⁹ Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, Vienna General Hospital, Vienna, Austria
¹⁰ Faculty of Engineering, Department of Computing, Imperial College London, London, UK

PMID: 39413217
Bookshelf ID: NBK608149
DOI: 10.3310/KPWQ4208

Excerpt

Background: Whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. Machine learning may support radiologists reading whole-body magnetic resonance imaging.

Objectives:

To develop a machine-learning algorithm to detect normal organs and cancer lesions.
To compare diagnostic accuracy, time and agreement of radiology reads to detect metastases using whole-body magnetic resonance imaging with concurrent machine learning (whole-body magnetic resonance imaging + machine learning) against standard whole-body magnetic resonance imaging (whole-body magnetic resonance imaging + standard deviation).

Design and participants: Retrospective analysis of (1) prospective single-centre study in healthy volunteers > 18 years (n = 51) and (2) prospective multicentre STREAMLINE study patient data (n = 438).

Tests: Index: whole-body magnetic resonance imaging + machine learning.

Comparator: whole-body magnetic resonance imaging + standard deviation.

Reference standard: Previously established expert panel consensus reference at 12 months from diagnosis.

Outcome measures: Primary: difference in per-patient specificity between whole-body magnetic resonance imaging + machine learning and whole-body magnetic resonance imaging + standard deviation. Secondary: per-patient sensitivity, per-lesion sensitivity and specificity, read time and agreement.

Methods: Phase 1: classification forests, convolutional neural networks, and a multi-atlas approaches for organ segmentation.

Phase 2/3: whole-body magnetic resonance imaging scans were allocated to Phase 2 (training = 226, validation = 45) and Phase 3 (testing = 193). Disease sites were manually labelled.

The final algorithm was applied to 193 Phase 3 cases, generating probability heatmaps. Twenty-five radiologists (18 experienced, 7 inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging + machine learning or whole-body magnetic resonance imaging + standard deviation over two or three rounds in a National Health Service setting. Read time was independently recorded.

Results: Phases 1 and 2: convolutional neural network had best Dice similarity coefficient, recall and precision measurements for healthy organ segmentation. Final algorithm used a ‘two-stage’ initial organ identification followed by lesion detection.

Phase 3: evaluable scans (188/193, of which 50 had metastases from 117 colon, 71 lung cancer cases) were read between November 2019 and March 2020. For experienced readers, per-patient specificity for detection of metastases was 86.2% (whole-body magnetic resonance imaging + machine learning) and 87.7% (whole-body magnetic resonance imaging + standard deviation), (difference −1.5%, 95% confidence interval −6.4% to 3.5%; p = 0.387); per-patient sensitivity was 66.0% (whole-body magnetic resonance imaging + machine learning) and 70.0% (whole-body magnetic resonance imaging + standard deviation) (difference −4.0%, 95% confidence interval −13.5% to 5.5%; p = 0.344). For inexperienced readers (53 reads, 15 with metastases), per-patient specificity was 76.3% in both groups with sensitivities of 73.3% (whole-body magnetic resonance imaging + machine learning) and 60.0% (whole-body magnetic resonance imaging + standard deviation). Per-site specificity remained high within all sites; above 95% (experienced) or 90% (inexperienced). Per-site sensitivity was highly variable due to low number of lesions in each site.

Reading time lowered under machine learning by 6.2% (95% confidence interval −22.8% to 10.0%). Read time was primarily influenced by read round with round 2 read times reduced by 32% (95% confidence interval 20.8% to 42.8%) overall with subsequent regression analysis showing a significant effect (p = 0.0281) by using machine learning in round 2 estimated as 286 seconds (or 11%) quicker.

Interobserver variance for experienced readers suggests moderate agreement, Cohen’s κ = 0.64, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + machine learning) and Cohen’s κ = 0.66, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + standard deviation).

Limitations: Patient whole-body magnetic resonance imaging data were heterogeneous with relatively few metastatic lesions in a wide variety of locations, making training and testing difficult and hampering evaluation of sensitivity.

Conclusions: There was no difference in diagnostic accuracy for whole-body magnetic resonance imaging radiology reads with or without machine-learning support, although radiology read time may be slightly shortened using whole-body magnetic resonance imaging + machine learning.

Future work: Failure-case analysis to improve model training, automate lesion segmentation and transfer of machine-learning techniques to other tumour types and imaging modalities.

Study registration: This study is registered as ISRCTN23068310.

Funding: This award was funded by the National Institute for Health and Care Research (NIHR) Efficacy and Mechanism Evaluation (EME) programme (NIHR award ref: 13/122/01) and is published in full in Efficacy and Mechanism Evaluation; Vol. 11, No. 15. See the NIHR Funding and Awards website for further award information.

Plain language summary

Whole-body magnetic resonance imaging demonstrates the entire body and can detect the spread of tumour, without the burden of ionising radiation. Recently, the STREAMLINE study reported that whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. However, whole-body magnetic resonance imaging is complex to report.

Machine learning is a type of artificial intelligence whereby a computer learns from being given previous data to undertake a task, using techniques such as classification forests, convolutional neural networks, and multi-atlas approaches. Our aim was to develop a machine-learning method to automatically detect lesions on whole-body magnetic resonance imaging to support radiologists by potentially improving their ability to correctly detect disease and reduce the reading time of whole-body magnetic resonance imaging scans in patients with cancer.

Firstly, whole-body magnetic resonance imaging scans from 51 healthy volunteers were used to develop machine-learning methods to automatically detect normal organs.

Secondly, machine-learning methods were trained to detect cancer lesions, using 271 whole-body magnetic resonance imaging scans from a previous study.

Finally, the refined machine-learning technique was tested in 188 patient scans from a previous study, to see if the technique could improve radiology reporting by increasing accuracy and speed in detecting disease. We designed a system to test the accuracy of radiologists reading whole-body magnetic resonance imaging with or without machine-learning support in a near-real clinical National Health Service setting. Twenty-five independent radiologists (18 experienced in reading whole-body magnetic resonance imaging and 7 radiologists inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging scans to read with or without machine-learning support. We found that machine-learning support resulted in similar accuracy for detecting disease and was slightly more efficient in the reading time than for radiological reads without machine-learning support. Differences in interpretation between experienced readers were considered moderate in both cases.

Overall, the study was an ambitious attempt to undertake a highly complex machine-learning task, to detect cancer on whole-body magnetic resonance imaging. Many important steps have been taken but the current machine-learning algorithm did not result in a significant improvement in the radiologist’s accuracy for disease detection, although it may have slightly reduced the time taken to read the study. Future work is advocated to further develop machine-learning tools to improve the accuracy of tumour detection.

PubMed Disclaimer

Sections

Publication types

Actions

LinkOut - more resources

Full Text Sources
- NCBI Bookshelf

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Affiliations

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Authors

Affiliations

Excerpt

Plain language summary

Sections

Publication types

LinkOut - more resources

Full Text Sources