Use of semantic features to classify patient smoking status
- PMID: 18998969
- PMCID: PMC2655942
Use of semantic features to classify patient smoking status
Abstract
The recent i2b2 NLP Challenge smoking classification task offers a rare chance to compare different natural language processing techniques on actual clinical data. We compare the performance of a classifier which relies on semantic features generated by an unmodified version of MedLEE, a clinical NLP engine, to one using lexical features. We also compare the performance of supervised classifiers to rule-based symbolic classifiers. Our baseline supervised classifier with lexical features yields a microaveraged F-measure of 0.81. Our rule-based classifier using MedLEE semantic features is superior, with an F-measure of 0.83. Our supervised classifier trained with semantic MedLEE features is competitive with the top-performing smoking classifier in the i2b2 NLP Challenge, with microaveraged precision of 0.90, recall of 0.89, and F-measure of 0.89.
Figures
References
-
- Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.
-
- Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys. 2002;34:1–17.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical