. 2020 Aug:4:711-723.

doi: 10.1200/CCI.19.00152.

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers

Bridie S Thompson¹, Sam Hardy², Nirmala Pandeya^{1

3}, Jean Claude Dusingize¹, Adele C Green^{1

4}, Athon Millane³, Daniel Bourke⁵, Ronald Grande⁵, Cameron D Bean⁵, Catherine M Olsen^{1

6}, David C Whiteman^{1

6}

Affiliations

¹ Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.
² Otso, Brisbane, Queensland, Australia.
³ School of Public Health, University of Queensland, Brisbane, Queensland, Australia.
⁴ Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.
⁵ Max Kelsen, Brisbane, Queensland, Australia.
⁶ Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia.

PMID: 32755460
PMCID: PMC7469600
DOI: 10.1200/CCI.19.00152

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers

Bridie S Thompson et al. JCO Clin Cancer Inform. 2020 Aug.

. 2020 Aug:4:711-723.

doi: 10.1200/CCI.19.00152.

Authors

Affiliations

¹ Department of Population Health, QIMR Berghofer Medical Research Institute, Brisbane Queensland, Australia.
² Otso, Brisbane, Queensland, Australia.
³ School of Public Health, University of Queensland, Brisbane, Queensland, Australia.
⁴ Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.
⁵ Max Kelsen, Brisbane, Queensland, Australia.
⁶ Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia.

PMID: 32755460
PMCID: PMC7469600
DOI: 10.1200/CCI.19.00152

Abstract

Purpose: Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports.

Methods: Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports.

Results: The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma.

Conclusion: Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence.

PubMed Disclaimer

Conflict of interest statement

Sam Hardy

Employment: Max Kelsen

Athon Millane

Other Relationship: Max Kelsen

Daniel Bourke

Employment: Max Kelsen

Ronald Grande

Employment: Max Kelsen

Cameron D. Bean

Speakers' Bureau: London Speakers Bureau (I)

David C. Whiteman

Employment: Fullerton Health Care (I)

No other potential conflicts of interest were reported.

Figures

**FIG A1.**
Test results for agreement (F1 score) and discordance of diagnoses between the predicted labels (algorithm derived classification) and true labels (actual diagnosis). Histologic names for labels are detailed in Table A1.

**FIG A2.**
Test results for agreement (F1 score) and discordance of site between the predicted labels (algorithm-predicted site) and true labels (actual site). Anatomic site names for labels are detailed in Table A2.

See this image and copyright information in PMC

References

1. Rogers HW, Weinstock MA, Feldman SR, et al. Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the U.S. population, 2012. JAMA Dermatol. 2015;151:1081–1086. - PubMed
1. Staples MP, Elwood M, Burton RC, et al. Non-melanoma skin cancer in Australia: The 2002 national survey and trends since 1985. Med J Aust. 2006;184:6–10. - PubMed
1. National Cancer Intelligence Network: Non-melanoma skin cancer in England, Scotland, Northern Ireland, and Ireland: NCIN data briefing 2013. http://www.ncin.org.uk/publications/data_briefings/non_melanoma_skin_can....
1. Fransen M, Karahalios A, Sharma N, et al. Non-melanoma skin cancer in Australia. Med J Aust. 2012;197:565–568. - PubMed
1. Hanauer DA, Miela G, Chinnaiyan AM, et al. The registry case finding engine: An automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes. J Am Coll Surg. 2007;205:690–697. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers

Affiliations

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers

Authors

Affiliations

Abstract

Conflict of interest statement

Sam Hardy

Athon Millane

Daniel Bourke

Ronald Grande

Cameron D. Bean

David C. Whiteman

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical