. 2022 Aug;49(8):5160-5181.

doi: 10.1002/mp.15777. Epub 2022 Jun 13.

Bridging the gap between prostate radiology and pathology through machine learning

Indrani Bhattacharya^{1

2}, David S Lim³, Han Lin Aung⁴, Xingchen Liu⁴, Arun Seetharaman⁵, Christian A Kunder⁶, Wei Shao¹, Simon J C Soerensen^{2

7}, Richard E Fan², Pejman Ghanouni^{1

2}, Katherine J To'o^{1

8}, James D Brooks², Geoffrey A Sonn^{1

2}, Mirabela Rusu¹

Affiliations

¹ Department of Radiology, Stanford University School of Medicine, Stanford, California, USA.
² Department of Urology, Stanford University School of Medicine, Stanford, California, USA.
³ Department of Computer Science, Stanford University, Stanford, California, USA.
⁴ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA.
⁵ Department of Electrical Engineering, Stanford University, Stanford, California, USA.
⁶ Department of Pathology, Stanford University School of Medicine, Stanford, California, USA.
⁷ Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, California, USA.
⁸ Department of Radiology, VA Palo Alto Health Care System, Palo Alto, California, USA.

PMID: 35633505
PMCID: PMC9543295
DOI: 10.1002/mp.15777

Bridging the gap between prostate radiology and pathology through machine learning

Indrani Bhattacharya et al. Med Phys. 2022 Aug.

. 2022 Aug;49(8):5160-5181.

doi: 10.1002/mp.15777. Epub 2022 Jun 13.

Authors

Affiliations

¹ Department of Radiology, Stanford University School of Medicine, Stanford, California, USA.
² Department of Urology, Stanford University School of Medicine, Stanford, California, USA.
³ Department of Computer Science, Stanford University, Stanford, California, USA.
⁴ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA.
⁵ Department of Electrical Engineering, Stanford University, Stanford, California, USA.
⁶ Department of Pathology, Stanford University School of Medicine, Stanford, California, USA.
⁷ Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, California, USA.
⁸ Department of Radiology, VA Palo Alto Health Care System, Palo Alto, California, USA.

PMID: 35633505
PMCID: PMC9543295
DOI: 10.1002/mp.15777

Abstract

Background: Prostate cancer remains the second deadliest cancer for American men despite clinical advancements. Currently, magnetic resonance imaging (MRI) is considered the most sensitive non-invasive imaging modality that enables visualization, detection, and localization of prostate cancer, and is increasingly used to guide targeted biopsies for prostate cancer diagnosis. However, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements.

Purpose: Machine learning methods to detect and localize cancer on prostate MRI can help standardize radiologist interpretations. However, existing machine learning methods vary not only in model architecture, but also in the ground truth labeling strategies used for model training. We compare different labeling strategies and the effects they have on the performance of different machine learning models for prostate cancer detection on MRI.

Methods: Four different deep learning models (SPCNet, U-Net, branched U-Net, and DeepLabv3+) were trained to detect prostate cancer on MRI using 75 patients with radical prostatectomy, and evaluated using 40 patients with radical prostatectomy and 275 patients with targeted biopsy. Each deep learning model was trained with four different label types: pathology-confirmed radiologist labels, pathologist labels on whole-mount histopathology images, and lesion-level and pixel-level digital pathologist labels (previously validated deep learning algorithm on histopathology images to predict pixel-level Gleason patterns) on whole-mount histopathology images. The pathologist and digital pathologist labels (collectively referred to as pathology labels) were mapped onto pre-operative MRI using an automated MRI-histopathology registration platform.

Results: Radiologist labels missed cancers (ROC-AUC: 0.75-0.84), had lower lesion volumes (~68% of pathology lesions), and lower Dice overlaps (0.24-0.28) when compared with pathology labels. Consequently, machine learning models trained with radiologist labels also showed inferior performance compared to models trained with pathology labels. Digital pathologist labels showed high concordance with pathologist labels of cancer (lesion ROC-AUC: 0.97-1, lesion Dice: 0.75-0.93). Machine learning models trained with digital pathologist labels had the highest lesion detection rates in the radical prostatectomy cohort (aggressive lesion ROC-AUC: 0.91-0.94), and had generalizable and comparable performance to pathologist label-trained-models in the targeted biopsy cohort (aggressive lesion ROC-AUC: 0.87-0.88), irrespective of the deep learning architecture. Moreover, machine learning models trained with pixel-level digital pathologist labels were able to selectively identify aggressive and indolent cancer components in mixed lesions on MRI, which is not possible with any human-annotated label type.

Conclusions: Machine learning models for prostate MRI interpretation that are trained with digital pathologist labels showed higher or comparable performance with pathologist label-trained models in both radical prostatectomy and targeted biopsy cohort. Digital pathologist labels can reduce challenges associated with human annotations, including labor, time, inter- and intra-reader variability, and can help bridge the gap between prostate radiology and pathology by enabling the training of reliable machine learning models to detect and localize prostate cancer on MRI.

Keywords: aggressive versus indolent cancer; cancer labels; deep learning; digital pathology; prostate MRI.

PubMed Disclaimer

Conflict of interest statement

Mirabela Rusu has research grants from GE Healthcare and Philips Healthcare.

Figures

**FIGURE 1**
Radiologists, pathologists, or digital pathologists are used to create labels on MRI and serve to train deep learning models to detect cancer and aggressive cancer on MRI. The pathology labels ( $L^{Path}$ , $L_{Lesion}^{DPath}$ , and $L_{Pixel}^{DPath}$ ) are derived through annotations on whole‐mount histopathology images and are mapped onto MRI through MRI‐histopathology registration. The pixel‐level digital pathologist label ( $L_{Pixel}^{DPath}$ ) enables identifying aggressive and indolent cancer components in mixed lesions, unlike the other label types

**FIGURE 2**
Differences in labeling strategies in a typical patient in cohort C1 test (aggressive cancer—yellow, indolent cancer—green) showed on (a) T2w images and (b) ADC images. The (c) radiologist labels ( $L^{Rad}$ ) and (d) pathologist labels ( $L^{Path}$ ) are present on some slices while the (e) lesion‐level digital pathologist labels ( $L_{Lesion}^{DPath}$ ), and (f) pixel‐level digital pathologist labels ( $L_{Pixel}^{DPath}$ ) exist on all slices. Digital pathologist labels strongly agree with pathologists while annotating aggressive and indolent cancer components in mixed lesions

**FIGURE 3**
Distribution of lesion volumes of discarded lesions for (a) radiologist ( $L^{Rad}$ ), (b) pathologist ( $L^{Path}$ ), (c) lesion‐level digital pathologist $(L_{Lesion}^{DPath})$ , and pixel‐level digital pathologist ( $L_{Pixel}^{DPath}$ ) labels for cohort C1 test. The red vertical line indicates the threshold lesion volume of 250 mm³. Only three radiologist lesions in C1 test were discarded, whereas, a large number of pathology lesions with predominantly tiny lesion volumes (median discarded lesion volume 0.4–1.3 mm³) were discarded. The y‐axis shows the frequency of distribution in log scale, while the x‐axis shows the lesion volume in mm³

**FIGURE 4**
The difference in resolution between the whole‐mount histopathology and the MR images, and the detailed gland‐level annotations of pathology labels, often result in tiny lesions which are (a) only a few pixels on MRI and clinically insignificant (shown by yellow arrows). Discarding small lesions with volumes <250 mm³ result in (b) cleaner and clinically meaningful lesions for training and evaluation of digital radiologist models. Zooming into these tiny lesions (red box in (a)) on (c) high resolution histopathology and (d) the registered MRI further reveals these are not clinically meaningful to be detected on MRI. While tiny, the lesion shown by the white arrow is not discarded as it gets connected to the lesion visible in the subsequent MRI slices

**FIGURE 5**
All the digital radiologist models (SPCNet, U‐Net, branched U‐Net, and DeepLabv3+) are trained with T2w and ADC images of the prostate as inputs. Each model is trained with one of the four label types as ground truth at a time. The DeepLabv3+ model is trained in a 2D fashion, with a single slice of T2w and ADC image as input (as shown in this figure), while the other models are trained in a 2.5D fashion with three consecutive MRI slices as inputs. Pre‐processing of the T2w and ADC images includes registration, cropping, and resampling around the prostate, and MRI intensity standardization and normalization

**FIGURE 6**
Quantitative comparison between cancer outlines of the different label types. (a) Dice overlap for cancer, (b) lesion‐level ROC‐AUC for cancer, (c) Dice overlap for aggressive cancer, (d) lesion‐level ROC‐AUC for aggressive cancer

**FIGURE 7**
The digital pathologist‐predicted automated aggressive (Gleason pattern 4, green) and indolent (Gleason pattern 3, blue) cancers visually match the manual cancer annotations by the expert pathologist (black, yellow, orange, and red). (a) Whole‐mount histopathology image with (b–d) close‐up into the two cancer lesions. (C) Cancer labels manually outlined by the expert pathologist (black outline) shows high agreement with overall cancer (combined blue and green) predicted by the digital pathologist model. (b, d) It is impractically time consuming for a human pathologist to manually assign pixel‐level Gleason patterns (yellow, orange, and red) to each gland in detail as done by the digital pathologist (blue and green)

**FIGURE 8**
Predictions from SPCNet trained with different label types of a typical patient from cohort C1 test (same as Figure 2) show that only $L_{Pixel}^{DPath}$ ‐trained SPCNet (f) selectively identified the aggressive and indolent cancer components in the lesion, while all other models detected the lesion as aggressive (SPCNet predictions: aggressive cancer [red], indolent cancer [blue[). (a) T2w images, (b) ADC images, (c) $L^{Rad}$ ‐trained SPCNet predictions, (d) $L^{Path}$ ‐trained SPCNet predictions, (e) $L_{Lesion}^{DPath}$ ‐trained SPCNet predictions, (f) $L_{Pixel}^{DPath}$ ‐trained SPCNet predictions

**FIGURE 9**
Labels and SPCNet predictions for three different patients from cohort C1 test (labels: aggressive cancer [yellow], indolent cancer [green]); SPCNet predictions: aggressive cancer [red], indolent cancer [blue]) on (a) T2w and (b) ADC images. The (c) $L^{Rad}$ labels and $L^{Rad}$ ‐trained SPCNet predictions may miss cancers or underestimate cancer extent. The (d) $L^{Path}$ labels and $L^{Path}$ ‐trained SPCNet predictions, and the (e) $L_{Lesion}^{DPath}$ and $L_{Lesion}^{DPath}$ ‐trained SPCNet predictions show strong agreement in cancer localization and extent. The (f) $L_{Pixel}^{DPath}$ and $L_{Pixel}^{DPath}$ ‐trained SPCNet predictions can selectively identify and localize the aggressive and indolent cancer components in the mixed lesions unlike any other label or prediction type. The outline for columns with SPCNet predictions correspond to pathologist annotations. Radiologists and pathologists are not required to annotate cancer extent on all slices of a patient for routine clinical care, but knowing the complete extent of cancer on all slices may be essential to train machine learning models. As such, C1‐Pat3 does not show a $L^{Path}$ label while cancer is present

**FIGURE 10**
SPCNet predictions for two different patients from cohort C2 on (a) T2w and (b) ADC images. The (c) $L^{Rad}$ ‐trained SPCNet predictions miss the cancer in the row 2 patient C2‐Pat2. The (d) $L^{Path}$ ‐trained and (e) $L_{Lesion}^{DPath}$ ‐trained SPCNet predictions detect the lesions in both patients, with the (e) $L_{Lesion}^{DPath}$ ‐trained predictions having the highest overlap with the cancer extent. The (f) $L_{Pixel}^{DPath}$ ‐trained SPCNet predictions are slightly off from the $L^{Rad}$ labels for the row 2 patient C2‐Pat2. The outlines for columns with SPCNet‐predictions correspond to radiologist labels ( $L^{Rad}$ )

**FIGURE 11**
Quantitative comparison between digital radiologist (SPCNet) predictions when trained and evaluated using different label types in cohort C1 test. The top row shows results for cancer detection, while the bottom row shows results for aggressive cancer detection. Darker blue boxes in the 4 × 4 matrices represent higher evaluation metrics.

See this image and copyright information in PMC

References

1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin. 2021;71:7‐33. - PubMed
1. Ahmed HU, Bosaily AE‐S, Brown LC, et al. Diagnostic accuracy of multi‐parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study. Lancet. 2017;389:815‐822. - PubMed
1. Bosaily AE‐S, Parker C, Brown LC, et al. PROMIS–prostate MR imaging study: a paired validating cohort study evaluating the role of multi‐parametric MRI in men with clinical suspicion of prostate cancer. Contemp Clin Trials. 2015;42:26‐40. - PMC - PubMed
1. Johnson DC, Raman SS, Mirak SA, et al. Detection of individual prostate cancer foci via multiparametric magnetic resonance imaging. Eur Urol. 2019;75:712‐720. - PubMed
1. van der Leest M, Cornel E, Israel B, et al. Head‐to‐head comparison of transrectal ultrasound‐guided prostate biopsy versus multiparametric prostate resonance imaging with subsequent magnetic resonance‐guided biopsy in biopsy‐naïve men with elevated prostate‐specific antigen: a large prospective multicenter clinical study. Eur Urol. 2019;75:570‐578. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

Department of Radiology/Stanford University Department of Radiology

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bridging the gap between prostate radiology and pathology through machine learning

Affiliations

Bridging the gap between prostate radiology and pathology through machine learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous