Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Southampton (UK): National Institute for Health and Care Research; 2024 Oct.
Affiliations
Free Books & Documents
Review

Development and evaluation of machine-learning methods in whole-body magnetic resonance imaging with diffusion weighted imaging for staging of patients with cancer: the MALIBO diagnostic test accuracy study

Andrea Rockall et al.
Free Books & Documents

Excerpt

Background: Whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. Machine learning may support radiologists reading whole-body magnetic resonance imaging.

Objectives:

  1. To develop a machine-learning algorithm to detect normal organs and cancer lesions.

  2. To compare diagnostic accuracy, time and agreement of radiology reads to detect metastases using whole-body magnetic resonance imaging with concurrent machine learning (whole-body magnetic resonance imaging + machine learning) against standard whole-body magnetic resonance imaging (whole-body magnetic resonance imaging + standard deviation).

Design and participants: Retrospective analysis of (1) prospective single-centre study in healthy volunteers > 18 years (n = 51) and (2) prospective multicentre STREAMLINE study patient data (n = 438).

Tests: Index: whole-body magnetic resonance imaging + machine learning.

Comparator: whole-body magnetic resonance imaging + standard deviation.

Reference standard: Previously established expert panel consensus reference at 12 months from diagnosis.

Outcome measures: Primary: difference in per-patient specificity between whole-body magnetic resonance imaging + machine learning and whole-body magnetic resonance imaging + standard deviation. Secondary: per-patient sensitivity, per-lesion sensitivity and specificity, read time and agreement.

Methods: Phase 1: classification forests, convolutional neural networks, and a multi-atlas approaches for organ segmentation.

Phase 2/3: whole-body magnetic resonance imaging scans were allocated to Phase 2 (training = 226, validation = 45) and Phase 3 (testing = 193). Disease sites were manually labelled.

The final algorithm was applied to 193 Phase 3 cases, generating probability heatmaps. Twenty-five radiologists (18 experienced, 7 inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging + machine learning or whole-body magnetic resonance imaging + standard deviation over two or three rounds in a National Health Service setting. Read time was independently recorded.

Results: Phases 1 and 2: convolutional neural network had best Dice similarity coefficient, recall and precision measurements for healthy organ segmentation. Final algorithm used a ‘two-stage’ initial organ identification followed by lesion detection.

Phase 3: evaluable scans (188/193, of which 50 had metastases from 117 colon, 71 lung cancer cases) were read between November 2019 and March 2020. For experienced readers, per-patient specificity for detection of metastases was 86.2% (whole-body magnetic resonance imaging + machine learning) and 87.7% (whole-body magnetic resonance imaging + standard deviation), (difference −1.5%, 95% confidence interval −6.4% to 3.5%; p = 0.387); per-patient sensitivity was 66.0% (whole-body magnetic resonance imaging + machine learning) and 70.0% (whole-body magnetic resonance imaging + standard deviation) (difference −4.0%, 95% confidence interval −13.5% to 5.5%; p = 0.344). For inexperienced readers (53 reads, 15 with metastases), per-patient specificity was 76.3% in both groups with sensitivities of 73.3% (whole-body magnetic resonance imaging + machine learning) and 60.0% (whole-body magnetic resonance imaging + standard deviation). Per-site specificity remained high within all sites; above 95% (experienced) or 90% (inexperienced). Per-site sensitivity was highly variable due to low number of lesions in each site.

Reading time lowered under machine learning by 6.2% (95% confidence interval −22.8% to 10.0%). Read time was primarily influenced by read round with round 2 read times reduced by 32% (95% confidence interval 20.8% to 42.8%) overall with subsequent regression analysis showing a significant effect (p = 0.0281) by using machine learning in round 2 estimated as 286 seconds (or 11%) quicker.

Interobserver variance for experienced readers suggests moderate agreement, Cohen’s κ = 0.64, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + machine learning) and Cohen’s κ = 0.66, 95% confidence interval 0.47 to 0.81 (whole-body magnetic resonance imaging + standard deviation).

Limitations: Patient whole-body magnetic resonance imaging data were heterogeneous with relatively few metastatic lesions in a wide variety of locations, making training and testing difficult and hampering evaluation of sensitivity.

Conclusions: There was no difference in diagnostic accuracy for whole-body magnetic resonance imaging radiology reads with or without machine-learning support, although radiology read time may be slightly shortened using whole-body magnetic resonance imaging + machine learning.

Future work: Failure-case analysis to improve model training, automate lesion segmentation and transfer of machine-learning techniques to other tumour types and imaging modalities.

Study registration: This study is registered as ISRCTN23068310.

Funding: This award was funded by the National Institute for Health and Care Research (NIHR) Efficacy and Mechanism Evaluation (EME) programme (NIHR award ref: 13/122/01) and is published in full in Efficacy and Mechanism Evaluation; Vol. 11, No. 15. See the NIHR Funding and Awards website for further award information.

Plain language summary

Whole-body magnetic resonance imaging demonstrates the entire body and can detect the spread of tumour, without the burden of ionising radiation. Recently, the STREAMLINE study reported that whole-body magnetic resonance imaging is accurate, efficient and cost-effective for cancer staging. However, whole-body magnetic resonance imaging is complex to report.

Machine learning is a type of artificial intelligence whereby a computer learns from being given previous data to undertake a task, using techniques such as classification forests, convolutional neural networks, and multi-atlas approaches. Our aim was to develop a machine-learning method to automatically detect lesions on whole-body magnetic resonance imaging to support radiologists by potentially improving their ability to correctly detect disease and reduce the reading time of whole-body magnetic resonance imaging scans in patients with cancer.

Firstly, whole-body magnetic resonance imaging scans from 51 healthy volunteers were used to develop machine-learning methods to automatically detect normal organs.

Secondly, machine-learning methods were trained to detect cancer lesions, using 271 whole-body magnetic resonance imaging scans from a previous study.

Finally, the refined machine-learning technique was tested in 188 patient scans from a previous study, to see if the technique could improve radiology reporting by increasing accuracy and speed in detecting disease. We designed a system to test the accuracy of radiologists reading whole-body magnetic resonance imaging with or without machine-learning support in a near-real clinical National Health Service setting. Twenty-five independent radiologists (18 experienced in reading whole-body magnetic resonance imaging and 7 radiologists inexperienced in whole-body magnetic resonance imaging) were randomly allocated whole-body magnetic resonance imaging scans to read with or without machine-learning support. We found that machine-learning support resulted in similar accuracy for detecting disease and was slightly more efficient in the reading time than for radiological reads without machine-learning support. Differences in interpretation between experienced readers were considered moderate in both cases.

Overall, the study was an ambitious attempt to undertake a highly complex machine-learning task, to detect cancer on whole-body magnetic resonance imaging. Many important steps have been taken but the current machine-learning algorithm did not result in a significant improvement in the radiologist’s accuracy for disease detection, although it may have slightly reduced the time taken to read the study. Future work is advocated to further develop machine-learning tools to improve the accuracy of tumour detection.

PubMed Disclaimer

LinkOut - more resources