A study of transportability of an existing smoking status detection module across institutions
- PMID: 23304330
- PMCID: PMC3540509
A study of transportability of an existing smoking status detection module across institutions
Abstract
Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.
Figures
Similar articles
-
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22. Ups J Med Sci. 2020. PMID: 32696698 Free PMC article.
-
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8. BMC Med Inform Decis Mak. 2017. PMID: 29191207 Free PMC article.
-
[A customized method for information extraction from unstructured text data in the electronic medical records].Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263. Beijing Da Xue Xue Bao Yi Xue Ban. 2018. PMID: 29643524 Chinese.
-
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7. J Biomed Inform. 2022. PMID: 35007754
-
Natural Language Processing in Nephrology.Adv Chronic Kidney Dis. 2022 Sep;29(5):465-471. doi: 10.1053/j.ackd.2022.07.001. Adv Chronic Kidney Dis. 2022. PMID: 36253030 Free PMC article. Review.
Cited by
-
Defining Phenotypes from Clinical Data to Drive Genomic Research.Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25. Annu Rev Biomed Data Sci. 2018. PMID: 34109303 Free PMC article.
-
Using Anchors to Estimate Clinical State without Labeled Data.AMIA Annu Symp Proc. 2014 Nov 14;2014:606-15. eCollection 2014. AMIA Annu Symp Proc. 2014. PMID: 25954366 Free PMC article.
-
Automated Extraction of Substance Use Information from Clinical Texts.AMIA Annu Symp Proc. 2015 Nov 5;2015:2121-30. eCollection 2015. AMIA Annu Symp Proc. 2015. PMID: 26958312 Free PMC article.
-
Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality.J Am Med Inform Assoc. 2015 Jan;22(1):179-91. doi: 10.1136/amiajnl-2014-002649. Epub 2014 Jul 22. J Am Med Inform Assoc. 2015. PMID: 25053577 Free PMC article.
-
A polymorphism in HLA-G modifies statin benefit in asthma.Pharmacogenomics J. 2015 Jun;15(3):272-7. doi: 10.1038/tpj.2014.55. Epub 2014 Sep 30. Pharmacogenomics J. 2015. PMID: 25266681 Free PMC article.
References
-
- Savova GK, Kipper-Schuler K, Buntrock JD, Chute CG. UIMA-based clinical information extraction system. LREC 2008: Towards enhanced interoperability for large HLT systems: UIMA for NLP. 2008
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources