This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Mar 3:rs.3.rs-2526701.

doi: 10.21203/rs.3.rs-2526701/v1.

REPRODUCIBLE AND CLINICALLY TRANSLATABLE DEEP NEURAL NETWORKS FOR CANCER SCREENING

Syed Rakin Ahmed^{1

2

3

4}, Brian Befano^{5

6}, Andreanne Lemay^{1

7}, Didem Egemen⁸, Ana Cecilia Rodriguez⁸, Sandeep Angara⁹, Kanan Desai⁸, Jose Jeronimo⁸, Sameer Antani⁹, Nicole Campos¹⁰, Federica Inturrisi⁸, Rebecca Perkins¹¹, Aimee Kreimer⁸, Nicolas Wentzensen⁸, Rolando Herrero¹², Marta Del Pino¹³, Wim Quint¹⁴, Silvia de Sanjose^{8

15}, Mark Schiffman⁸, Jayashree Kalpathy-Cramer¹

Affiliations

¹ Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02129, USA.
² Harvard Graduate Program in Biophysics, Harvard Medical School, Harvard University, Cambridge, MA 02115, USA.
³ Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
⁴ Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH 02139,USA.
⁵ Information Management Services, Calverton, MD 20705, USA.
⁶ University of Washington, Seattle, WA 98195, USA.
⁷ NeuroPoly, Polytechnique Montreal, Montreal, QC H3T 1N8, Canada.
⁸ Clinical Epidemiology Unit, Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892.
⁹ Computational Health Research Branch, National Library of Medicine, Lister Hill Center, Bethesda, MD 20894.
¹⁰ Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston MA 02115.
¹¹ Dept of Obstetrics & Gynecology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA 02118.
¹² Agencia Costarricense de Investigaciones Biomedicas (ACIB), Fundacion INCIENSA, San Jose, Costa Rica.
¹³ Hospital Clinic, Barcelona, Spain.
¹⁴ DDL Diagnostic Laboratory, Rijswijk, The Netherlands.
¹⁵ ISGlobal, Barcelona, Spain.

PMID: 36909463
PMCID: PMC10002800
DOI: 10.21203/rs.3.rs-2526701/v1

REPRODUCIBLE AND CLINICALLY TRANSLATABLE DEEP NEURAL NETWORKS FOR CANCER SCREENING

Syed Rakin Ahmed et al. Res Sq. 2023.

[Preprint]. 2023 Mar 3:rs.3.rs-2526701.

doi: 10.21203/rs.3.rs-2526701/v1.

Authors

Affiliations

¹ Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02129, USA.
² Harvard Graduate Program in Biophysics, Harvard Medical School, Harvard University, Cambridge, MA 02115, USA.
³ Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
⁴ Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH 02139,USA.
⁵ Information Management Services, Calverton, MD 20705, USA.
⁶ University of Washington, Seattle, WA 98195, USA.
⁷ NeuroPoly, Polytechnique Montreal, Montreal, QC H3T 1N8, Canada.
⁸ Clinical Epidemiology Unit, Clinical Genetics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892.
⁹ Computational Health Research Branch, National Library of Medicine, Lister Hill Center, Bethesda, MD 20894.
¹⁰ Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston MA 02115.
¹¹ Dept of Obstetrics & Gynecology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA 02118.
¹² Agencia Costarricense de Investigaciones Biomedicas (ACIB), Fundacion INCIENSA, San Jose, Costa Rica.
¹³ Hospital Clinic, Barcelona, Spain.
¹⁴ DDL Diagnostic Laboratory, Rijswijk, The Netherlands.
¹⁵ ISGlobal, Barcelona, Spain.

PMID: 36909463
PMCID: PMC10002800
DOI: 10.21203/rs.3.rs-2526701/v1

Update in

Reproducible and clinically translatable deep neural networks for cervical screening.
Ahmed SR, Befano B, Lemay A, Egemen D, Rodriguez AC, Angara S, Desai K, Jeronimo J, Antani S, Campos N, Inturrisi F, Perkins R, Kreimer A, Wentzensen N, Herrero R, Del Pino M, Quint W, de Sanjose S, Schiffman M, Kalpathy-Cramer J. Ahmed SR, et al. Sci Rep. 2023 Dec 8;13(1):21772. doi: 10.1038/s41598-023-48721-1. Sci Rep. 2023. PMID: 38066031 Free PMC article.

Abstract

Cervical cancer is a leading cause of cancer mortality, with approximately 90% of the 250,000 deaths per year occurring in low- and middle-income countries (LMIC). Secondary prevention with cervical screening involves detecting and treating precursor lesions; however, scaling screening efforts in LMIC has been hampered by infrastructure and cost constraints. Recent work has supported the development of an artificial intelligence (AI) pipeline on digital images of the cervix to achieve an accurate and reliable diagnosis of treatable precancerous lesions. In particular, WHO guidelines emphasize visual triage of women testing positive for human papillomavirus (HPV) as the primary screen, and AI could assist in this triage task. Published AI reports have exhibited overfitting, lack of portability, and unrealistic, near-perfect performance estimates. To surmount recognized issues, we implemented a comprehensive deep-learning model selection and optimization study on a large, collated, multi-institutional dataset of 9,462 women (17,013 images). We evaluated relative portability, repeatability, and classification performance. The top performing model, when combined with HPV type, achieved an area under the Receiver Operating Characteristics (ROC) curve (AUC) of 0.89 within our study population of interest, and a limited total extreme misclassification rate of 3.4%, on held-aside test sets. Our work is among the first efforts at designing a robust, repeatable, accurate and clinically translatable deep-learning model for cervical screening.

Keywords: artificial intelligence; cervical cancer screening; deep learning; human papillomavirus.

PubMed Disclaimer

Conflict of interest statement

Additional Declarations: There is NO Competing Interest.

Figures

**FIGURE 1:**
Model selection and optimization overview. The top panel highlights the five different studies (NHS, ALTS, CVT, Biop and D Biop; see Table 1, Supp. Table 1, and Supp. Methods for detailed description and breakdown of the studies by ground truth) used to generate the final dataset on the middle panel, which is subsequently used to generate a train and validation set, as well as two separate test sets. The intersections of model selection choices on the bottom panel are used to generate a compendium of models trained using the corresponding train and validation sets and evaluated on Test Set 1, optimizing for repeatability, classification performance, reduced extreme misclassifications and combined risk-stratification with high-risk human papillomavirus (HPV) types. Test Set 2 is utilized to verify the performance of top candidates that emerge from evaluation on Test Set 1. SWT: Swin Transformer; QWK: quadratic weighted kappa; CORAL: CORAL (consistent rank logits) loss, as described in the METHODS section.

**FIGURE 2:**
Model selection approach and statistical analysis utilized in our automated visual evaluation (AVE) classifier. IQR: interquartile range; AUC: area under the receiver operating characteristics (ROC) curve; CI: confidence interval.

**FIGURE 3:**
(a) Median quadratic weighted kappa (QWK) and adjusted linear regression (LR) β across the various design choices, as part of the repeatability analysis. (b) Median Youden’s index, median % precancer+ as normal (% p as n) and median % normal as precancer+ (% n as p), with the corresponding adjusted LR β values across the various design choices (after filtering for repeatability), as part of the classification performance analysis. Muted bars indicate design choices dropped at each stage. SWT: Swin Transformer; CORAL: CORAL (consistent rank logits) loss, as described in the METHODS section; ref: reference category.

**FIGURE 4:**
(a) Difference between HPV+AVE combined AUC and HPV-only AUC in the HPV positive NHS subset for top 10 models (b) Receiver operating characteristics (ROC) curves for each of the top 4 best performing models in the HPV positive NHS subset of the full dataset The plotted lines indicate 1. HPV AUC, 2. AVE AUC and 3. combined HPV-AVE AUC, for models (i) 36, (ii) 65, (iii) 34, and (iv) 81. HPV: human papillomavirus; AVE: automated visual evaluation, which refers to the classifier; AUC: area under the ROC curve.

**FIGURE 5:**
(a) Classification and repeatability results on Test Set 2 for top 10 best performing models, highlighting the % precancer+ as normal (%p as n) and % normal as precancer+ (%n as p) (left), the % 2-class disagreement between image pairs across women (middle), and the quadratic weighted kappa (QWK) values on the discrete class outcomes for paired images across women (right) for each model. (b) Representative plots for the top performing model (# 36) on Test Set 2 - (i) Receiver operating characteristics (ROC) curves for the normal vs rest (Class 0 vs. rest) and precancer+ vs. rest (Class 2 vs. rest) cases, (ii) confusion matrix, (iii) histogram of model predicted continuous *score*, color coded by ground truth, and (iv) Bland Altman plot of model predictions, color coded by ground truth: each point on this plot refers to a single woman, with the y-axis representing the maximum difference in the score across repeat images per woman, and the x-axis plotting the mean of the corresponding score across all repeat images per woman.

**FIGURE 6:**
Model level comparison across top-10 best performing models. 60 images were randomly selected (see METHODS: Statistical Analysis section) and arranged in order of increasing mean score within each ground truth class in the top row (labelled “Ground Truth”). The model predicted class for the top 10 models for each of these 60 images is highlighted in the bottom rows, where the images follow the same order as the top row. The color coding in the top row represents ground truth while in the bottom 10 rows represent the model predicted class. Green: Normal, Gray: Gray Zone, and Red: Precancer +, as highlighted in the legend. Each image corresponds to a different woman.

See this image and copyright information in PMC

References

1. Piccialli F, Somma V Di, Giampaolo F, Cuomo S, Fortino G. A survey on deep learning in medicine: Why, how and when? Inf Fusion. 2021. Feb 1 ;66:111–37.
1. Sperr E. PubMed by Year [Internet], [cited 2022 Nov 12]. Available from: https://esperr.github.io/pubmed-by-year/?q1=%22deeplearning%22or%22neura...
1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nat 2017 5427639 [Internet]. 2017. Jan 25 [cited 2022 Nov 12];542(7639):115–8. Available from: https://www.nature.com/articles/nature21056 - PMC - PubMed
1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 2019 251 [Internet]. 2019. Jan 7 [cited 2022 Nov 12];25(1):65–9. Available from: https://www.nature.com/articles/s41591-018-0268-3 - PMC - PubMed
1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019 251 [Internet]. 2019. Jan 7 [cited 2022 May 5];25(1):44–56. Available from: https://www.nature.com/articles/s41591-018-0300-7 - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

REPRODUCIBLE AND CLINICALLY TRANSLATABLE DEEP NEURAL NETWORKS FOR CANCER SCREENING

Affiliations

REPRODUCIBLE AND CLINICALLY TRANSLATABLE DEEP NEURAL NETWORKS FOR CANCER SCREENING

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources