Automatic information extraction from childhood cancer pathology reports
- PMID: 35721398
- PMCID: PMC9202570
- DOI: 10.1093/jamiaopen/ooac049
Automatic information extraction from childhood cancer pathology reports
Abstract
Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification.
Materials and methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries.
Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports.
Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.
Keywords: cancer pathology reports; information extraction; machine learning; pediatric cancer.
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Figures



References
-
- Siegel RL, Miller KD, Fuchs HE, et al.Cancer statistics, 2021. CA Cancer J Clin 2021; 71 (1): 7–33. - PubMed
-
- Ward E, DeSantis C, Robbins A, et al.Childhood and adolescent cancer statistics, 2014. CA Cancer J Clin 2014; 64 (2): 83–103. - PubMed
-
- Qiu JX, Yoon HJ, Fearn PA, et al.Deep learning for automated extraction of primary sites from cancer pathology reports. IEEE J Biomed Health Inform 2018; 22 (1): 244–51. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources