Using natural language processing to extract clinically useful information from Chinese electronic medical records
- PMID: 30784428
- DOI: 10.1016/j.ijmedinf.2019.01.004
Using natural language processing to extract clinically useful information from Chinese electronic medical records
Abstract
Aims: To develop a natural language processing (NLP)-based algorithm for extracting clinically useful information for patients with hepatocellular carcinoma (HCC) from Chinese electronic medical records (EMRs) and use these data for the assessment of HCC staging.
Materials and methods: Clinical documents, including operation notes, radiology and pathology reports, of 92 HCC patients were collected from Chinese EMRs. We randomly grouped these patients into training (n = 60) and testing (n = 32) datasets. Rule-based and hybrid methods for extracting information were developed using the training set of manually-annotated operation notes. The method with better performance was used to process other documents. The performance of the algorithm was assessed via calculating the precision, recall and F-score for exact-boundary and partial-boundary matching strategies. The utility of clinically useful information for the HCC staging was assessed in comparison with that manually reviewed.
Results: For operation notes, the rule-based and hybrid methods had a precision, recall and F-score ≥80% when the exact-boundary and partial-boundary matching strategies were applied to the testing dataset. By using the rule-based method (which has better performance than the hybrid method), three other types of documents also obtained good performance. When the extracted clinically useful information was applied for the HCC staging, the concordance rate with the manual review was 75%.
Conclusion: A NLP system was developed for clinical information extraction and HCC staging based on EMRs, and the results indicate that Chinese NLP has potential utility in clinical research.
Keywords: Cancer of liver Italian p (CLIP); Chinese EMRs; Hybrid method; Regular expression; Rule-based method.
Copyright © 2019 Elsevier B.V. All rights reserved.
Similar articles
-
Extracting important information from Chinese Operation Notes with natural language processing methods.J Biomed Inform. 2014 Apr;48:130-6. doi: 10.1016/j.jbi.2013.12.017. Epub 2014 Jan 31. J Biomed Inform. 2014. PMID: 24486562
-
[A customized method for information extraction from unstructured text data in the electronic medical records].Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263. Beijing Da Xue Xue Bao Yi Xue Ban. 2018. PMID: 29643524 Chinese.
-
Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing.Med Care. 2016 Feb;54(2):e9-14. doi: 10.1097/MLR.0b013e3182a30373. Med Care. 2016. PMID: 23929403 Free PMC article.
-
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30. J Digit Imaging. 2010. PMID: 19484309 Free PMC article. Review.
-
Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review.J Endourol. 2024 Aug;38(8):852-864. doi: 10.1089/end.2023.0690. Epub 2024 May 13. J Endourol. 2024. PMID: 38613805 Review.
Cited by
-
Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study.JMIR Med Inform. 2022 Apr 25;10(4):e35475. doi: 10.2196/35475. JMIR Med Inform. 2022. PMID: 35468085 Free PMC article.
-
Deep phenotyping and whole-exome sequencing improved the diagnostic yield for nuclear pedigrees with neurodevelopmental disorders.Mol Genet Genomic Med. 2022 May;10(5):e1918. doi: 10.1002/mgg3.1918. Epub 2022 Mar 10. Mol Genet Genomic Med. 2022. PMID: 35266334 Free PMC article.
-
Advancing the development of real-world data for healthcare research in China: challenges and opportunities.BMJ Open. 2022 Jul 29;12(7):e063139. doi: 10.1136/bmjopen-2022-063139. BMJ Open. 2022. PMID: 35906059 Free PMC article. Review.
-
Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130. JCO Clin Cancer Inform. 2024. PMID: 38194615 Free PMC article.
-
A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):214. doi: 10.1186/s12911-021-01575-x. BMC Med Inform Decis Mak. 2021. PMID: 34330277 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical