Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Nov;21(11):1887-94.
doi: 10.1007/s10552-010-9616-4. Epub 2010 Jul 23.

Pattern-based information extraction from pathology reports for cancer registration

Affiliations
Comparative Study

Pattern-based information extraction from pathology reports for cancer registration

Giulio Napolitano et al. Cancer Causes Control. 2010 Nov.

Abstract

Objective: To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness.

Method: Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports potentially containing Gleason score was used for evaluation.

Results: Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved.

Conclusions: In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.

PubMed Disclaimer

Publication types

LinkOut - more resources