Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Joeran S Bosma¹, Anindo Saha¹, Matin Hosseinzadeh¹, Ivan Slootweg¹, Maarten de Rooij¹, Henkjan Huisman¹

Affiliations

Affiliation

¹ From the Diagnostic Image Analysis Group, Department of Medical Imaging, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA Nijmegen, the Netherlands.

PMID: 37795142
PMCID: PMC10546362
DOI: 10.1148/ryai.230031

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Joeran S Bosma et al. Radiol Artif Intell. 2023.

. 2023 Jul 26;5(5):e230031.

doi: 10.1148/ryai.230031. eCollection 2023 Sep.

Authors

Joeran S Bosma¹, Anindo Saha¹, Matin Hosseinzadeh¹, Ivan Slootweg¹, Maarten de Rooij¹, Henkjan Huisman¹

Affiliation

¹ From the Diagnostic Image Analysis Group, Department of Medical Imaging, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA Nijmegen, the Netherlands.

PMID: 37795142
PMCID: PMC10546362
DOI: 10.1148/ryai.230031

Abstract

Purpose: To evaluate a novel method of semisupervised learning (SSL) guided by automated sparse information from diagnostic reports to leverage additional data for deep learning-based malignancy detection in patients with clinically significant prostate cancer.

Materials and methods: This retrospective study included 7756 prostate MRI examinations (6380 patients) performed between January 2014 and December 2020 for model development. An SSL method, report-guided SSL (RG-SSL), was developed for detection of clinically significant prostate cancer using biparametric MRI. RG-SSL, supervised learning (SL), and state-of-the-art SSL methods were trained using 100, 300, 1000, or 3050 manually annotated examinations. Performance on detection of clinically significant prostate cancer by RG-SSL, SL, and SSL was compared on 300 unseen examinations from an external center with a histopathologically confirmed reference standard. Performance was evaluated using receiver operating characteristic (ROC) and free-response ROC analysis. P values for performance differences were generated with a permutation test.

Results: At 100 manually annotated examinations, mean examination-based diagnostic area under the ROC curve (AUC) values for RG-SSL, SL, and the best SSL were 0.86 ± 0.01 (SD), 0.78 ± 0.03, and 0.81 ± 0.02, respectively. Lesion-based detection partial AUCs were 0.62 ± 0.02, 0.44 ± 0.04, and 0.48 ± 0.09, respectively. Examination-based performance of SL with 3050 examinations was matched by RG-SSL with 169 manually annotated examinations, thus requiring 14 times fewer annotations. Lesion-based performance was matched with 431 manually annotated examinations, requiring six times fewer annotations.

Conclusion: RG-SSL outperformed SSL in clinically significant prostate cancer detection and achieved performance similar to SL even at very low annotation budgets.Keywords: Annotation Efficiency, Computer-aided Detection and Diagnosis, MRI, Prostate Cancer, Semisupervised Deep Learning Supplemental material is available for this article. Published under a CC BY 4.0 license.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: J.S.B. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. A.S. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. M.H. No relevant relationships. I.S. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. M.d.R. No relevant relationships. H.H. Partial grant Siemens Healthineers in combination with LSH Dutch government funding (institution).

Figures

Overview of the semisupervised learning method for malignancy
detection: (1) train the teacher model with manual labels; (2) count the
number of clinically significant lesions described in the report, nsig; (3)
localize and segment the lesions, by keeping the n sig most confident lesion
candidates of the teacher model; (4) train the student model with manual and
pseudo labels. ADC = apparent diffusion coefficient, bpMRI = biparametric
MRI, csPCA = clinically significant prostate cancer, DWI =
diffusion-weighted imaging, PI-RADS = Prostate Imaging and Reporting Data
System, T2W = T2-weighted. — **Figure 1:**
Overview of the semisupervised learning method for malignancy detection: (1) train the teacher model with manual labels; (2) count the number of clinically significant lesions described in the report, n_sig; (3) localize and segment the lesions, by keeping the n_sig most confident lesion candidates of the teacher model; (4) train the student model with manual and pseudo labels. ADC = apparent diffusion coefficient, bpMRI = biparametric MRI, csPCA = clinically significant prostate cancer, DWI = diffusion-weighted imaging, PI-RADS = Prostate Imaging and Reporting Data System, T2W = T2-weighted.

Accuracy of natural language processing score extraction algorithm, as
depicted by the confusion matrix for number of clinically significant
findings in a radiology report. Evaluated on the manually labeled
development dataset. PI-RADS = Prostate Imaging and Reporting Data
System. — **Figure 2:**
Accuracy of natural language processing score extraction algorithm, as depicted by the confusion matrix for number of clinically significant findings in a radiology report. Evaluated on the manually labeled development dataset. PI-RADS = Prostate Imaging and Reporting Data System.

Quality of the pseudo labels, as evaluated by free-response receiver
operating characteristic (FROC) analysis for matching manually annotated
Prostate Imaging and Reporting Data System (PI-RADS) 4 or greater lesions in
the manually labeled development dataset. Supervised models used to generate
report-guided pseudo labels were trained with fivefold cross-validation on
the manually labeled development dataset. Uncertainty-aware mean teacher and
cross pseudo supervision models were trained with fivefold cross-validation
on the development dataset. Filtering pseudo labels using the number of
clinically significant findings described in the diagnostic report (nsig)
greatly reduced the number of false-positive lesions per examination
(report-guided pseudo labels [intermediate]). Excluding examinations with
fewer than nsig lesion candidates improved sensitivity (report-guided pseudo
labels). Shaded areas indicate 95% CIs. Error bars indicate SDs. — **Figure 3:**
Quality of the pseudo labels, as evaluated by free-response receiver operating characteristic (FROC) analysis for matching manually annotated Prostate Imaging and Reporting Data System (PI-RADS) 4 or greater lesions in the manually labeled development dataset. Supervised models used to generate report-guided pseudo labels were trained with fivefold cross-validation on the manually labeled development dataset. Uncertainty-aware mean teacher and cross pseudo supervision models were trained with fivefold cross-validation on the development dataset. Filtering pseudo labels using the number of clinically significant findings described in the diagnostic report (n_sig) greatly reduced the number of false-positive lesions per examination (report-guided pseudo labels [intermediate]). Excluding examinations with fewer than n_sig lesion candidates improved sensitivity (report-guided pseudo labels). Shaded areas indicate 95% CIs. Error bars indicate SDs.

Model performance for semisupervised and supervised learning. Top row:
Supervised models were trained with fivefold cross-validation on 3050
manually labeled examinations, and semisupervised learning (SSL) also
included 4706 unlabeled examinations. Report-guided SSL significantly
outperformed supervised learning as well as the baseline SSL methods. Bottom
row: Model performance for 100, 300, 1000, and 3050 manually labeled
examinations, combined with 7656, 7456, 6756, and 4706 unlabeled
examinations, respectively. Report-guided SSL significantly outperformed the
baseline SSL methods and supervised learning at each annotation budget,
except for examination-based area under the receiver operating
characteristic curve (AUC) of uncertainty-aware mean teacher trained with
1000 labeled examinations. Left: Receiver operating characteristic (ROC)
performance for examination-based diagnosis of examinations with at least
one lesion with Gleason grade group (GGG) 2 or greater. Right: Free-response
ROC (FROC) performance for lesion-based diagnosis of lesions with GGG 2 or
greater. All models were trained with radiology-based Prostate Imaging and
Reporting Data System 4 or greater labels and evaluated on the external test
set with histopathologically confirmed ground truth. Shaded areas indicate
the 95% CIs from 15 or five independent training runs. Error bars indicate
SDs across 15 or five independent training runs. pAUC = partial area under
the receiver operating characteristic curve. — **Figure 4:**
Model performance for semisupervised and supervised learning. Top row: Supervised models were trained with fivefold cross-validation on 3050 manually labeled examinations, and semisupervised learning (SSL) also included 4706 unlabeled examinations. Report-guided SSL significantly outperformed supervised learning as well as the baseline SSL methods. Bottom row: Model performance for 100, 300, 1000, and 3050 manually labeled examinations, combined with 7656, 7456, 6756, and 4706 unlabeled examinations, respectively. Report-guided SSL significantly outperformed the baseline SSL methods and supervised learning at each annotation budget, except for examination-based area under the receiver operating characteristic curve (AUC) of uncertainty-aware mean teacher trained with 1000 labeled examinations. Left: Receiver operating characteristic (ROC) performance for examination-based diagnosis of examinations with at least one lesion with Gleason grade group (GGG) 2 or greater. Right: Free-response ROC (FROC) performance for lesion-based diagnosis of lesions with GGG 2 or greater. All models were trained with radiology-based Prostate Imaging and Reporting Data System 4 or greater labels and evaluated on the external test set with histopathologically confirmed ground truth. Shaded areas indicate the 95% CIs from 15 or five independent training runs. Error bars indicate SDs across 15 or five independent training runs. pAUC = partial area under the receiver operating characteristic curve.

See this image and copyright information in PMC

References

1. Ardila D , Kiraly AP , Bharadwaj S , et al. . End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography . Nat Med 2019. ; 25 ( 6 ): 954 – 961 . [Published correction appears in Nat Med 2019;25(8):1319.] - PubMed
1. McKinney SM , Sieniek M , Godbole V , et al. . International evaluation of an AI system for breast cancer screening . Nature 2020. ; 577 ( 7788 ): 89 – 94 . [Published correction appears in Nature 2020;586(7829):E19.] - PubMed
1. Liu Y , Jain A , Eng C , et al. . A deep learning system for differential diagnosis of skin diseases . Nat Med 2020. ; 26 ( 6 ): 900 – 908 . - PubMed
1. Mahajan D , Girshick R , Ramanathan V , et al. . Exploring the limits of weakly supervised pretraining . In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science ; vol. 11206 . Cham, Switzerland: : Springer International Publishing; , 2018. ; 185 – 201 . https://link.springer.com/10.1007/978-3-030-01216-8_12. Accessed June 5, 2023 . - DOI
1. Xie Q , Luong MT , Hovy E , Le QV . Self-training with noisy student improves ImageNet classification . In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Seattle, WA: . IEEE; , 2020. ; 10684 – 10695 . https://ieeexplore.ieee.org/document/9156610/. Accessed June 5, 2023 .

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Affiliation

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources