Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 26;5(5):e230031.
doi: 10.1148/ryai.230031. eCollection 2023 Sep.

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Affiliations

Semisupervised Learning with Report-guided Pseudo Labels for Deep Learning-based Prostate Cancer Detection Using Biparametric MRI

Joeran S Bosma et al. Radiol Artif Intell. .

Abstract

Purpose: To evaluate a novel method of semisupervised learning (SSL) guided by automated sparse information from diagnostic reports to leverage additional data for deep learning-based malignancy detection in patients with clinically significant prostate cancer.

Materials and methods: This retrospective study included 7756 prostate MRI examinations (6380 patients) performed between January 2014 and December 2020 for model development. An SSL method, report-guided SSL (RG-SSL), was developed for detection of clinically significant prostate cancer using biparametric MRI. RG-SSL, supervised learning (SL), and state-of-the-art SSL methods were trained using 100, 300, 1000, or 3050 manually annotated examinations. Performance on detection of clinically significant prostate cancer by RG-SSL, SL, and SSL was compared on 300 unseen examinations from an external center with a histopathologically confirmed reference standard. Performance was evaluated using receiver operating characteristic (ROC) and free-response ROC analysis. P values for performance differences were generated with a permutation test.

Results: At 100 manually annotated examinations, mean examination-based diagnostic area under the ROC curve (AUC) values for RG-SSL, SL, and the best SSL were 0.86 ± 0.01 (SD), 0.78 ± 0.03, and 0.81 ± 0.02, respectively. Lesion-based detection partial AUCs were 0.62 ± 0.02, 0.44 ± 0.04, and 0.48 ± 0.09, respectively. Examination-based performance of SL with 3050 examinations was matched by RG-SSL with 169 manually annotated examinations, thus requiring 14 times fewer annotations. Lesion-based performance was matched with 431 manually annotated examinations, requiring six times fewer annotations.

Conclusion: RG-SSL outperformed SSL in clinically significant prostate cancer detection and achieved performance similar to SL even at very low annotation budgets.Keywords: Annotation Efficiency, Computer-aided Detection and Diagnosis, MRI, Prostate Cancer, Semisupervised Deep Learning Supplemental material is available for this article. Published under a CC BY 4.0 license.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: J.S.B. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. A.S. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. M.H. No relevant relationships. I.S. Health~Holland grant: LSHM20103 European Union H2020 grants: ProCAncer-I project (952159), PANCAIM project (101016851), Siemens Healthineers grant: CID: C00225450. M.d.R. No relevant relationships. H.H. Partial grant Siemens Healthineers in combination with LSH Dutch government funding (institution).

Figures

None
Graphical abstract
Overview of the semisupervised learning method for malignancy
detection: (1) train the teacher model with manual labels; (2) count the
number of clinically significant lesions described in the report, nsig; (3)
localize and segment the lesions, by keeping the n sig most confident lesion
candidates of the teacher model; (4) train the student model with manual and
pseudo labels. ADC = apparent diffusion coefficient, bpMRI = biparametric
MRI, csPCA = clinically significant prostate cancer, DWI =
diffusion-weighted imaging, PI-RADS = Prostate Imaging and Reporting Data
System, T2W = T2-weighted.
Figure 1:
Overview of the semisupervised learning method for malignancy detection: (1) train the teacher model with manual labels; (2) count the number of clinically significant lesions described in the report, nsig; (3) localize and segment the lesions, by keeping the nsig most confident lesion candidates of the teacher model; (4) train the student model with manual and pseudo labels. ADC = apparent diffusion coefficient, bpMRI = biparametric MRI, csPCA = clinically significant prostate cancer, DWI = diffusion-weighted imaging, PI-RADS = Prostate Imaging and Reporting Data System, T2W = T2-weighted.
Accuracy of natural language processing score extraction algorithm, as
depicted by the confusion matrix for number of clinically significant
findings in a radiology report. Evaluated on the manually labeled
development dataset. PI-RADS = Prostate Imaging and Reporting Data
System.
Figure 2:
Accuracy of natural language processing score extraction algorithm, as depicted by the confusion matrix for number of clinically significant findings in a radiology report. Evaluated on the manually labeled development dataset. PI-RADS = Prostate Imaging and Reporting Data System.
Quality of the pseudo labels, as evaluated by free-response receiver
operating characteristic (FROC) analysis for matching manually annotated
Prostate Imaging and Reporting Data System (PI-RADS) 4 or greater lesions in
the manually labeled development dataset. Supervised models used to generate
report-guided pseudo labels were trained with fivefold cross-validation on
the manually labeled development dataset. Uncertainty-aware mean teacher and
cross pseudo supervision models were trained with fivefold cross-validation
on the development dataset. Filtering pseudo labels using the number of
clinically significant findings described in the diagnostic report (nsig)
greatly reduced the number of false-positive lesions per examination
(report-guided pseudo labels [intermediate]). Excluding examinations with
fewer than nsig lesion candidates improved sensitivity (report-guided pseudo
labels). Shaded areas indicate 95% CIs. Error bars indicate SDs.
Figure 3:
Quality of the pseudo labels, as evaluated by free-response receiver operating characteristic (FROC) analysis for matching manually annotated Prostate Imaging and Reporting Data System (PI-RADS) 4 or greater lesions in the manually labeled development dataset. Supervised models used to generate report-guided pseudo labels were trained with fivefold cross-validation on the manually labeled development dataset. Uncertainty-aware mean teacher and cross pseudo supervision models were trained with fivefold cross-validation on the development dataset. Filtering pseudo labels using the number of clinically significant findings described in the diagnostic report (nsig) greatly reduced the number of false-positive lesions per examination (report-guided pseudo labels [intermediate]). Excluding examinations with fewer than nsig lesion candidates improved sensitivity (report-guided pseudo labels). Shaded areas indicate 95% CIs. Error bars indicate SDs.
Model performance for semisupervised and supervised learning. Top row:
Supervised models were trained with fivefold cross-validation on 3050
manually labeled examinations, and semisupervised learning (SSL) also
included 4706 unlabeled examinations. Report-guided SSL significantly
outperformed supervised learning as well as the baseline SSL methods. Bottom
row: Model performance for 100, 300, 1000, and 3050 manually labeled
examinations, combined with 7656, 7456, 6756, and 4706 unlabeled
examinations, respectively. Report-guided SSL significantly outperformed the
baseline SSL methods and supervised learning at each annotation budget,
except for examination-based area under the receiver operating
characteristic curve (AUC) of uncertainty-aware mean teacher trained with
1000 labeled examinations. Left: Receiver operating characteristic (ROC)
performance for examination-based diagnosis of examinations with at least
one lesion with Gleason grade group (GGG) 2 or greater. Right: Free-response
ROC (FROC) performance for lesion-based diagnosis of lesions with GGG 2 or
greater. All models were trained with radiology-based Prostate Imaging and
Reporting Data System 4 or greater labels and evaluated on the external test
set with histopathologically confirmed ground truth. Shaded areas indicate
the 95% CIs from 15 or five independent training runs. Error bars indicate
SDs across 15 or five independent training runs. pAUC = partial area under
the receiver operating characteristic curve.
Figure 4:
Model performance for semisupervised and supervised learning. Top row: Supervised models were trained with fivefold cross-validation on 3050 manually labeled examinations, and semisupervised learning (SSL) also included 4706 unlabeled examinations. Report-guided SSL significantly outperformed supervised learning as well as the baseline SSL methods. Bottom row: Model performance for 100, 300, 1000, and 3050 manually labeled examinations, combined with 7656, 7456, 6756, and 4706 unlabeled examinations, respectively. Report-guided SSL significantly outperformed the baseline SSL methods and supervised learning at each annotation budget, except for examination-based area under the receiver operating characteristic curve (AUC) of uncertainty-aware mean teacher trained with 1000 labeled examinations. Left: Receiver operating characteristic (ROC) performance for examination-based diagnosis of examinations with at least one lesion with Gleason grade group (GGG) 2 or greater. Right: Free-response ROC (FROC) performance for lesion-based diagnosis of lesions with GGG 2 or greater. All models were trained with radiology-based Prostate Imaging and Reporting Data System 4 or greater labels and evaluated on the external test set with histopathologically confirmed ground truth. Shaded areas indicate the 95% CIs from 15 or five independent training runs. Error bars indicate SDs across 15 or five independent training runs. pAUC = partial area under the receiver operating characteristic curve.

References

    1. Ardila D , Kiraly AP , Bharadwaj S , et al. . End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography . Nat Med 2019. ; 25 ( 6 ): 954 – 961 . [Published correction appears in Nat Med 2019;25(8):1319.] - PubMed
    1. McKinney SM , Sieniek M , Godbole V , et al. . International evaluation of an AI system for breast cancer screening . Nature 2020. ; 577 ( 7788 ): 89 – 94 . [Published correction appears in Nature 2020;586(7829):E19.] - PubMed
    1. Liu Y , Jain A , Eng C , et al. . A deep learning system for differential diagnosis of skin diseases . Nat Med 2020. ; 26 ( 6 ): 900 – 908 . - PubMed
    1. Mahajan D , Girshick R , Ramanathan V , et al. . Exploring the limits of weakly supervised pretraining . In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science ; vol. 11206 . Cham, Switzerland: : Springer International Publishing; , 2018. ; 185 – 201 . https://link.springer.com/10.1007/978-3-030-01216-8_12. Accessed June 5, 2023 . - DOI
    1. Xie Q , Luong MT , Hovy E , Le QV . Self-training with noisy student improves ImageNet classification . In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Seattle, WA: . IEEE; , 2020. ; 10684 – 10695 . https://ieeexplore.ieee.org/document/9156610/. Accessed June 5, 2023 .

LinkOut - more resources