[A customized method for information extraction from unstructured text data in the electronic medical records]
- PMID: 29643524
[A customized method for information extraction from unstructured text data in the electronic medical records]
Abstract
Objective: There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing.
Methods: Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method.
Results: For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94.
Conclusion: This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.
Similar articles
-
Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696. Health Informatics J. 2023. PMID: 37068028
-
Using natural language processing to extract clinically useful information from Chinese electronic medical records.Int J Med Inform. 2019 Apr;124:6-12. doi: 10.1016/j.ijmedinf.2019.01.004. Epub 2019 Jan 7. Int J Med Inform. 2019. PMID: 30784428
-
A method for cohort selection of cardiovascular disease records from an electronic health record system.Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30. Int J Med Inform. 2017. PMID: 28495342
-
Extraction of temporal relations from clinical free text: A systematic review of current approaches.J Biomed Inform. 2020 Aug;108:103488. doi: 10.1016/j.jbi.2020.103488. Epub 2020 Jul 13. J Biomed Inform. 2020. PMID: 32673788
-
Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review.J Endourol. 2024 Aug;38(8):852-864. doi: 10.1089/end.2023.0690. Epub 2024 May 13. J Endourol. 2024. PMID: 38613805 Review.
Cited by
-
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2. BMC Med Inform Decis Mak. 2020. PMID: 32252745 Free PMC article.
-
Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models.JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264. JMIR AI. 2023. PMID: 38875552 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous