Symbolic rule-based classification of lung cancer stages from free-text pathology reports
- PMID: 20595312
- PMCID: PMC2995652
- DOI: 10.1136/jamia.2010.003707
Symbolic rule-based classification of lung cancer stages from free-text pathology reports
Abstract
Objective: To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification.
Design: By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage.
Measurements: The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines.
Results: Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches.
Conclusion: A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.
Conflict of interest statement
Figures
Similar articles
-
Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications.Stud Health Technol Inform. 2011;168:117-24. Stud Health Technol Inform. 2011. PMID: 21893919
-
Multi-class classification of cancer stages from free-text histology reports using support vector machines.Annu Int Conf IEEE Eng Med Biol Soc. 2007;2007:5140-3. doi: 10.1109/IEMBS.2007.4353497. Annu Int Conf IEEE Eng Med Biol Soc. 2007. PMID: 18003163
-
Classification of cancer stage from free-text histology reports.Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5153-6. doi: 10.1109/IEMBS.2006.259563. Conf Proc IEEE Eng Med Biol Soc. 2006. PMID: 17945879
-
The new 8th TNM staging system of lung cancer and its potential imaging interpretation pitfalls and limitations with CT image demonstrations.Diagn Interv Radiol. 2019 Jul;25(4):270-279. doi: 10.5152/dir.2019.18458. Diagn Interv Radiol. 2019. PMID: 31295144 Free PMC article. Review.
-
Revisions to the TNM Staging of Lung Cancer: Rationale, Significance, and Clinical Application.Radiographics. 2018 Mar-Apr;38(2):374-391. doi: 10.1148/rg.2018170081. Radiographics. 2018. PMID: 29528831 Review.
Cited by
-
Classification of Contextual Use of Left Ventricular Ejection Fraction Assessments.Stud Health Technol Inform. 2015;216:599-603. Stud Health Technol Inform. 2015. PMID: 26262121 Free PMC article.
-
Machine Learning Approaches for Extracting Stage from Pathology Reports in Prostate Cancer.Stud Health Technol Inform. 2019 Aug 21;264:1522-1523. doi: 10.3233/SHTI190515. Stud Health Technol Inform. 2019. PMID: 31438212 Free PMC article.
-
Optimizing clinical trials recruitment via deep learning.J Am Med Inform Assoc. 2019 Nov 1;26(11):1195-1202. doi: 10.1093/jamia/ocz064. J Am Med Inform Assoc. 2019. PMID: 31188432 Free PMC article.
-
Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.AMIA Annu Symp Proc. 2018 Dec 5;2018:807-816. eCollection 2018. AMIA Annu Symp Proc. 2018. PMID: 30815123 Free PMC article.
-
Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models.PLoS One. 2013;8(3):e58772. doi: 10.1371/journal.pone.0058772. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505559 Free PMC article.
References
-
- Greene FL, Page DL, Fleming ID, et al., eds. AJCC cancer staging manual. 6th edn New York: Springer-Verlag, 2002
-
- Cancer Australia A national cancer data strategy for Australia. 2008. http://www.canceraustralia.gov.au (accessed Nov 2009).
-
- College of American Pathologists SNOMED CT–Encoded CAP cancer checklist (version 1.5). 2006. http://www.cap.org/ (accessed Jun 2006).
MeSH terms
LinkOut - more resources
Full Text Sources
Medical