Pattern-based information extraction from pathology reports for cancer registration
- PMID: 20652738
- DOI: 10.1007/s10552-010-9616-4
Pattern-based information extraction from pathology reports for cancer registration
Abstract
Objective: To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness.
Method: Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports potentially containing Gleason score was used for evaluation.
Results: Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved.
Conclusions: In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
