Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr:124:6-12.
doi: 10.1016/j.ijmedinf.2019.01.004. Epub 2019 Jan 7.

Using natural language processing to extract clinically useful information from Chinese electronic medical records

Affiliations

Using natural language processing to extract clinically useful information from Chinese electronic medical records

Liang Chen et al. Int J Med Inform. 2019 Apr.

Abstract

Aims: To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging.

Materials and methods: Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed.

Results: For operation notes, the rule-based and hybrid methods had a precision, recall and F-score ≥80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%.

Conclusion: A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.

Keywords: Cancer of liver Italian p (CLIP); Chinese EMRs; Hybrid method; Regular expression; Rule-based method.

PubMed Disclaimer

Similar articles

Cited by

Publication types

LinkOut - more resources