. 2022 Jan 5:13:5.

doi: 10.4103/jpi.jpi_31_21. eCollection 2022.

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities

David J Foran^{1

2}, Eric B Durbin^{3

4}, Wenjin Chen¹, Evita Sadimin^{1

2}, Ashish Sharma⁵, Imon Banerjee⁵, Tahsin Kurc⁶, Nan Li⁵, Antoinette M Stroup⁷, Gerald Harris⁷, Annie Gu⁵, Maria Schymura⁸, Rajarsi Gupta⁶, Erich Bremer⁶, Joseph Balsamo⁶, Tammy DiPrima⁶, Feiqiao Wang⁶, Shahira Abousamra⁹, Dimitris Samaras⁹, Isaac Hands⁴, Kevin Ward¹⁰, Joel H Saltz⁶

Affiliations

¹ Center for Biomedical Informatics, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
² Department of Pathology and Laboratory Medicine, Rutgers-Robert Wood Johnson Medical School, Piscataway, NJ, USA.
³ Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, KY, USA.
⁴ Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, Lexington, KY, USA.
⁵ Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, USA.
⁶ Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA.
⁷ New Jersey State Cancer Registry, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
⁸ New York State Cancer Registry, New York State Department of Health, Albany, NY, USA.
⁹ Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
¹⁰ Georgia State Cancer Registry, Georgia Department of Public Health, Atlanta, GA, USA.

PMID: 35136672
PMCID: PMC8794027
DOI: 10.4103/jpi.jpi_31_21

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities

David J Foran et al. J Pathol Inform. 2022.

. 2022 Jan 5:13:5.

doi: 10.4103/jpi.jpi_31_21. eCollection 2022.

Authors

Affiliations

¹ Center for Biomedical Informatics, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
² Department of Pathology and Laboratory Medicine, Rutgers-Robert Wood Johnson Medical School, Piscataway, NJ, USA.
³ Kentucky Cancer Registry, Markey Cancer Center, University of Kentucky, Lexington, KY, USA.
⁴ Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, Lexington, KY, USA.
⁵ Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, USA.
⁶ Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA.
⁷ New Jersey State Cancer Registry, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
⁸ New York State Cancer Registry, New York State Department of Health, Albany, NY, USA.
⁹ Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
¹⁰ Georgia State Cancer Registry, Georgia Department of Public Health, Atlanta, GA, USA.

PMID: 35136672
PMCID: PMC8794027
DOI: 10.4103/jpi.jpi_31_21

Abstract

Background: Population-based state cancer registries are an authoritative source for cancer statistics in the United States. They routinely collect a variety of data, including patient demographics, primary tumor site, stage at diagnosis, first course of treatment, and survival, on every cancer case that is reported across all U.S. states and territories. The goal of our project is to enrich NCI's Surveillance, Epidemiology, and End Results (SEER) registry data with high-quality population-based biospecimen data in the form of digital pathology, machine-learning-based classifications, and quantitative histopathology imaging feature sets (referred to here as Pathomics features).

Materials and methods: As part of the project, the underlying informatics infrastructure was designed, tested, and implemented through close collaboration with several participating SEER registries to ensure consistency with registry processes, computational scalability, and ability to support creation of population cohorts that span multiple sites. Utilizing computational imaging algorithms and methods to both generate indices and search for matches makes it possible to reduce inter- and intra-observer inconsistencies and to improve the objectivity with which large image repositories are interrogated.

Results: Our team has created and continues to expand a well-curated repository of high-quality digitized pathology images corresponding to subjects whose data are routinely collected by the collaborating registries. Our team has systematically deployed and tested key, visual analytic methods to facilitate automated creation of population cohorts for epidemiological studies and tools to support visualization of feature clusters and evaluation of whole-slide images. As part of these efforts, we are developing and optimizing advanced search and matching algorithms to facilitate automated, content-based retrieval of digitized specimens based on their underlying image features and staining characteristics.

Conclusion: To meet the challenges of this project, we established the analytic pipelines, methods, and workflows to support the expansion and management of a growing repository of high-quality digitized pathology and information-rich, population cohorts containing objective imaging and clinical attributes to facilitate studies that seek to discriminate among different subtypes of disease, stratify patient populations, and perform comparisons of tumor characteristics within and across patient cohorts. We have also successfully developed a suite of tools based on a deep-learning method to perform quantitative characterizations of tumor regions, assess infiltrating lymphocyte distributions, and generate objective nuclear feature measurements. As part of these efforts, our team has implemented reliable methods that enable investigators to systematically search through large repositories to automatically retrieve digitized pathology specimens and correlated clinical data based on their computational signatures.

Keywords: Cancer registries; computational imaging; deep-learning; digital pathology.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts of interest.

Figures

**Fig. 1**
Workflow for assembling linked image/data cohorts.

**Fig. 2**
Clinical Research Data Warehouse workflow. The research data warehouse aggregates information from multiple data sources such as electronic health records, tumor registries, and radiology and pathology archives. It facilitates review of imaging data and linked clinical data on a single patient or cohort basis.

**Fig. 3**
TIL and tumor analysis results displayed as a heatmap on the whole slide tissue image. TIL analysis results on the left and the tumor segmentation results on the right. The red color indicates a higher probability of a patch being TIL-positive (or tumor-positive) and the blue color indicates a lower probability

**Fig. 4**
Segmented nuclei overlaid as polygons shown in blue on the WSI. Each polygon represents the boundary of a segmented nucleus

**Fig. 5**
The iterative workflow starts with a set of patches which are extracted from whole slide tissue images and labeled for initial model training. Predictions from the trained model are reviewed as feature maps and heatmaps. The heatmaps are annotated to generate additional labeled patches which are added to the training dataset. The deep learning network is retrained with the updated training dataset to refine the model

**Fig. 6**
A feature map representation of TIL and tumor analysis results generated from a WSI in the Cancer Genome Atlas repository. The low-resolution version of the input WSI is displayed in the upper left corner. The upper right corner is the tumor segmentation map. The TIL map is displayed in the lower left corner. The lower right corner is the combined and thresholded TIL and tumor maps.

**Fig. 7**
Pathology image workflow. WSIs are de-identified and analyzed by deep-learning analysis pipelines deployed in containers. Image data are linked to the SEER Registry database to enhance it with quantitative imaging features (such as TIL distributions and tumor segmentations) extracted by deep-learning models. De-identified images and imaging features can then be used for data mining and research purposes.

See this image and copyright information in PMC

Cited by

An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing.
Foran DJ, Chen W, Kurc T, Gupta R, Kaczmarzyk JR, Torre-Healy LA, Bremer E, Ajjarapu S, Do N, Harris G, Stroup A, Durbin E, Saltz JH. Foran DJ, et al. Cancer Inform. 2024 Feb 4;23:11769351231223806. doi: 10.1177/11769351231223806. eCollection 2024. Cancer Inform. 2024. PMID: 38322427 Free PMC article.
Biobanking in the digital pathology era.
Bonizzi G, Zattoni L, Fusco N. Bonizzi G, et al. Oncol Res. 2022 Aug 31;29(4):229-233. doi: 10.32604/or.2022.024892. eCollection 2021. Oncol Res. 2022. PMID: 37303941 Free PMC article.

References

1. Allsbrook W.C., Jr., Mangold K.A., Johnson M.H., Lane R.B., Lane C.G., Epstein J.I. Interobserver reproducibility of Gleason grading of prostatic carcinoma: General pathologist. Hum Pathol. 2001;32:81–88. - PubMed
1. Berney D.M., Algaba F., Camparo P., et al. The reasons behind variation in Gleason grading of prostatic biopsies: Areas of agreement and misconception among 266 European pathologists. Histopathology. 2014;64:405–411. - PubMed
1. Bueno-de-Mesquita J.M., Nuyten D.S., Wesseling J., van Tinteren H., Linn S.C., van de Vijver M.J. The impact of inter-observer variation in pathological assessment of node-negative breast cancer on clinical risk assessment and patient selection for adjuvant systemic treatment. Ann Oncol. 2010;21:40–47. - PubMed
1. Grilley-Olson J.E., Hayes D.N., Moore D.T., et al. Validation of interobserver agreement in lung cancer assessment: Hematoxylin-eosin diagnostic reproducibility for non-small cell lung cancer: The 2004 World Health Organization classification and therapeutically relevant subsets. Arch Pathol Lab Med. 2013;137:32–40. - PMC - PubMed
1. Matasar M.J., Shi W., Silberstien J., et al. Expert second-opinion pathology review of lymphoma in the era of the World Health Organization classification. Ann Oncol. 2012;23:159–166. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities

Affiliations

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials