The feasibility of using natural language processing to extract clinical information from breast pathology reports

Julliette M Buckley¹, Suzanne B Coopey, John Sharko, Fernanda Polubriaginof, Brian Drohan, Ahmet K Belli, Elizabeth M H Kim, Judy E Garber, Barbara L Smith, Michele A Gadd, Michelle C Specht, Constance A Roche, Thomas M Gudewicz, Kevin S Hughes

Affiliations

PMID: 22934236
PMCID: PMC3424662
DOI: 10.4103/2153-3539.97788

The feasibility of using natural language processing to extract clinical information from breast pathology reports

Julliette M Buckley et al. J Pathol Inform. 2012.

. 2012:3:23.

doi: 10.4103/2153-3539.97788. Epub 2012 Jun 30.

Authors

Affiliation

¹ Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts, USA.

PMID: 22934236
PMCID: PMC3424662
DOI: 10.4103/2153-3539.97788

Abstract

Objective: The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text.

Results: There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders.

Conclusion: We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

Keywords: Breast pathology reports; clinical decision support; natural language processing.

PubMed Disclaimer

Figures

**Figure 1**
Sample pathology report showing the fields extracted (highlighted in bold type). Each specimen was parsed separately and generated its own “final diagnosis”

**Figure 2**
Sample datasheet displaying extracted diagnostic information from the sample report shown in Figure 1. As each specimen generated its own “final diagnosis,” a single row was created for each specimen by MRN, date, side and specimen in the first of three databases created

**Figure 3**
Sample datasheet showing examples of missed diagnoses by the software. In row 1, “atypical hyperplasia” was not associated with either “ductal” or “lobular” and thus was not a pattern recognized by the software. In rows 2 and 3, the way in which “atypical ductal hyperplasia” was written was not a pattern recognized by the software. In row 3, typographical errors in the spelling of “carcinoma” meant the presence of DCIS was not detected by the processor

See this image and copyright information in PMC

References

1. Osheroff JA, Teich JM, Middleton B, Steen EB, Wright A, Detmer DE. A Roadmap for national action on clinical decision support. J Am Med Inform Assoc. 2007;14:141–5. - PMC - PubMed
1. Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: An initial study. J Am Med Inform Assoc. 2008;15:87–98. - PMC - PubMed
1. Liddy ED. Natural Language Processing. In: Drake MA, editor. Encyclopedia of library and information science. 2nd ed. New York: Marcel Decker Inc; 2001.
1. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding neuroradiology reports for the Northern Manhattan Stroke Study: A comparison of natural language processing and manual review. Comput Biomed Res. 2000;33:1–10. - PubMed
1. Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002;224:157–63. - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The feasibility of using natural language processing to extract clinical information from breast pathology reports

Affiliation

The feasibility of using natural language processing to extract clinical information from breast pathology reports

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources