Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 18:2017:16-25.
eCollection 2018.

Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

Affiliations

Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

Abdulrahman K AAlAbdulsalam et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
NLP and ML application high-level architecture.
Figure 2:
Figure 2:
Frequency of TNM stage mentions extracted per patient.

References

    1. Rebecca L Siegel, Kimberly D Miller, Ahmedin Jemal. Cancer statistics. CA: a cancer journal for clinicians. 2016;66(1):7–30. - PubMed
    1. BWKP Stewart, Christopher P Wild, et al. World cancer report 2014. Health. 2017
    1. Donald M Parkin. The evolution of the population-based cancer registry. Nature reviews. Cancer. 2006;6(8):603. - PubMed
    1. Jan Willem Coebergh, Corina van den Hurk, Stefano Rosso, Harry Comber, Hans Storm, Roberto Zanetti, Lidia Sacchetto, Maryska Janssen-Heijnen, Melissa Thong, Sabine Siesling, et al. Eurocourse lessons learned from and for population-based cancer registries in europe and their programme owners: improving performance by research programming for public health and clinical evaluation. European Journal of Cancer. 2015;51(9):997–1017. - PubMed
    1. R Zanetti, I Schmidtmann, L Sacchetto, F Binder-Foucard, A Bordoni, D Coza, S Ferretti, J Galceran, A Gavin, N Larranaga, et al. Completeness and timeliness: cancer registries could/should improve their performance. European journal of cancer. 2015;51(9):1091–1098. - PubMed

LinkOut - more resources