. 2018 Apr 18;50(2):256-263.

[A customized method for information extraction from unstructured text data in the electronic medical records]

[Article in Chinese]

X Y Bao¹, W J Huang², K Zhang³, M Jin¹, Y Li⁴, C Z Niu⁵

Affiliations

¹ Medical Informatics Center, Peking University, Beijing 100191, China; National Clinical Service Data Center, Beijing 100191, China.
² School of Mathematical Sciences, Peking University, Beijing 100871, China.
³ Peking University School of Basic Medical Science, Beijing 100191, China.
⁴ National Clinical Service Data Center, Beijing 100191, China; Department of Hospital Management, Peking University Health Science Center, Beijing 100191, China.
⁵ Department of Information, the First Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China.

PMID: 29643524

Free article

[A customized method for information extraction from unstructured text data in the electronic medical records]

[Article in Chinese]

X Y Bao et al. Beijing Da Xue Xue Bao Yi Xue Ban. 2018.

Free article

. 2018 Apr 18;50(2):256-263.

Authors

X Y Bao¹, W J Huang², K Zhang³, M Jin¹, Y Li⁴, C Z Niu⁵

Affiliations

¹ Medical Informatics Center, Peking University, Beijing 100191, China; National Clinical Service Data Center, Beijing 100191, China.
² School of Mathematical Sciences, Peking University, Beijing 100871, China.
³ Peking University School of Basic Medical Science, Beijing 100191, China.
⁴ National Clinical Service Data Center, Beijing 100191, China; Department of Hospital Management, Peking University Health Science Center, Beijing 100191, China.
⁵ Department of Information, the First Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China.

PMID: 29643524

Abstract

Objective: There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing.

Methods: Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method.

Results: For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94.

Conclusion: This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.

PubMed Disclaimer

Cited by

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.
Zhang T, Wang Y, Wang X, Yang Y, Ye Y. Zhang T, et al. BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2. BMC Med Inform Decis Mak. 2020. PMID: 32252745 Free PMC article.
Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models.
Nurmambetova E, Pan J, Zhang Z, Wu G, Lee S, Southern DA, Martin EA, Ho C, Xu Y, Eastwood CA. Nurmambetova E, et al. JMIR AI. 2023 Mar 8;2:e41264. doi: 10.2196/41264. JMIR AI. 2023. PMID: 38875552 Free PMC article.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Journal of Peking University (Health Sciences)
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

[A customized method for information extraction from unstructured text data in the electronic medical records]

Affiliations

[A customized method for information extraction from unstructured text data in the electronic medical records]

Authors

Affiliations

Abstract

Similar articles

Cited by

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous