A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes
- PMID: 21347100
- PMCID: PMC3041302
A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes
Abstract
Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.
Objective: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.
Methods: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.
Results: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.
References
-
- Feero WG, Guttmacher AE, Collins FS. The genome gets personal--almost. JAMA. 2008;299:1351–2. - PubMed
-
- Birman-Deych E, Waterman AD, Yan Y, Nilasena DS, Radford MJ, Gage BF. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43:480–5. - PubMed
-
- Schmiedeskamp M, Harpe S, Polk R, Oinonen M, Pakyz A. Use of International Classification of Diseases, Ninth Revision, Clinical Modification codes and medication use data to identify nosocomial Clostridium difficile infection. Infect Control Hosp Epidemiol. 2009;30:1070–6. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources