Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May:2019:10.1109/bhi.2019.8834586.
doi: 10.1109/bhi.2019.8834586. Epub 2019 Sep 12.

Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports

Affiliations

Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports

Mohammed Alawad et al. IEEE EMBS Int Conf Biomed Health Inform. 2019 May.

Abstract

Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.

Keywords: NLP; Transfer learning; convolutional neural network; information extraction; pathology reports.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
F-score and number of samples of the most (a) and least (b) represented classes with at least 100 samples. A model is trained on either LA-Train or KY-Train set, and tested on KY-Test or LA-Test, where all transferred parameters are frozen.
Fig. 2.
Fig. 2.
F-score and number of samples of randomly selected classes from LA (a) and KY (b) datasets with number of samples less than 100.
Fig. 3.
Fig. 3.
F-score and number of samples of the least represented ten classes from LA dataset with at least 100 samples.
Fig. 4.
Fig. 4.
F-score and number of samples of the least represented ten classes from KY dataset with at least 100 samples.
Fig. 5.
Fig. 5.
F-score and number of samples of the most represented ten classes from LA dataset.
Fig. 6.
Fig. 6.
F-score and number of samples of the most represented ten classes from KY dataset.

References

    1. Qiu JX, Yoon H, Fearn PA, and Tourassi GD, “Deep learning for automated extraction of primary sites from cancer pathology reports,” IEEE Journal of Biomedical and Health Informatics, vol. 22, pp. 244–251, Jan 2018. - PubMed
    1. Gao S, Young MT, Qiu JX, Yoon H, Christian JB, Fearn PA,Tourassi GD, and Ramanthan A, “Hierarchical attention networks for information extraction from cancer pathology reports,” JAMIA, vol. 25, no. 3, pp. 321–330, 2018. - PMC - PubMed
    1. Alawad M, Yoon H, and Tourassi GD, “Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports,” in IEEE EMBS International Conference on Biomedical Health Informatics (BHI), March 2018.
    1. Semwal T, Yenigalla P, Mathur G, and Nair SB, “A practitioners’ guide to transfer learning for text classification using convolutional neural networks,” in Proceedings of the 2018 SIAM International Conference on Data Mining, SDM, pp. 513–521, May 2018.
    1. Weiss K, Khoshgoftaar TM, and Wang D, “A survey of transfer learning,” Journal of Big Data, vol. 3, p. 9, May 2016.

LinkOut - more resources