Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul-Aug;17(4):440-5.
doi: 10.1136/jamia.2010.003707.

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

Affiliations

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

Anthony N Nguyen et al. J Am Med Inform Assoc. 2010 Jul-Aug.

Abstract

Objective: To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification.

Design: By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage.

Measurements: The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines.

Results: Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches.

Conclusion: A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Medical text extraction (MEDTEX) pipeline application used to classify cancer stages.

Similar articles

Cited by

References

    1. Greene FL, Page DL, Fleming ID, et al., eds. AJCC cancer staging manual. 6th edn New York: Springer-Verlag, 2002
    1. McCowan IA, Moore DC, Nguyen AN, et al. Collection of cancer stage data by classifying free-text medical reports. J Am Med Inform Assoc 2007;14:736–45 - PMC - PubMed
    1. Threlfall T, Wittorff J, Boutdara P, et al. Collection of population-based cancer staging information in Western Australia—a feasibility study. Melbourne, Australia: National Cancer Control Initiative, 2004 - PMC - PubMed
    1. Cancer Australia A national cancer data strategy for Australia. 2008. http://www.canceraustralia.gov.au (accessed Nov 2009).
    1. College of American Pathologists SNOMED CT–Encoded CAP cancer checklist (version 1.5). 2006. http://www.cap.org/ (accessed Jun 2006).