Comparative Study

. 2013 Nov-Dec;20(6):1120-7.

doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Diana Lynn MacLean¹, Jeffrey Heer

Affiliations

PMID: 23645553
PMCID: PMC3822103
DOI: 10.1136/amiajnl-2012-001110

Comparative Study

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Diana Lynn MacLean et al. J Am Med Inform Assoc. 2013 Nov-Dec.

. 2013 Nov-Dec;20(6):1120-7.

doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.

Authors

Diana Lynn MacLean¹, Jeffrey Heer

Affiliation

¹ Department of Computer Science, Stanford University, Stanford, California, USA.

PMID: 23645553
PMCID: PMC3822103
DOI: 10.1136/amiajnl-2012-001110

Abstract

Background and objective: As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical terms in patient-authored text (PAT). We demonstrate that crowdsourcing PAT medical term identification tasks to non-experts is a viable method for creating large, accurately-labeled PAT datasets; moreover, such datasets can be used to train classifiers that outperform existing medical term identification tools.

Materials and methods: To evaluate the viability of using non-expert crowds to label PAT, we compare expert (registered nurses) and non-expert (Amazon Mechanical Turk workers; Turkers) responses to a PAT medical term identification task. Next, we build a crowd-labeled dataset comprising 10 000 sentences from MedHelp. We train two models on this dataset and evaluate their performance, as well as that of MetaMap, Open Biomedical Annotator (OBA), and NaCTeM's TerMINE, against two gold standard datasets: one from MedHelp and the other from CureTogether.

Results: When aggregated according to a corroborative voting policy, Turker responses predict expert responses with an F1 score of 84%. A conditional random field (CRF) trained on 10 000 crowd-labeled MedHelp sentences achieves an F1 score of 78% against the CureTogether gold standard, widely outperforming OBA (47%), TerMINE (43%), and MetaMap (39%). A failure analysis of the CRF suggests that misclassified terms are likely to be either generic or rare.

Conclusions: Our results show that combining statistical models sensitive to sentence-level context with crowd-labeled data is a scalable and effective technique for automatically identifying medical terms in PAT.

Keywords: crowdsourcing; medical term extraction; online health forums; text mining.

PubMed Disclaimer

Figures

**Figure 1**
Patient-authored text (PAT) medical word identification task instructions and interface. Access the article online to view this figure in colour.

**Figure 2**
An illustration of our corroborative, word-level voting policy. Stopwords (like ‘of’) are excluded from the vote.

**Figure 3**
A comparison of terms identified as medically-relevant (shown in black) by different models in five sample sentences. OBA and MetaMap are run using the SNOMED CT ontology.

**Figure 4**
Term classification accuracy plotted against logged term frequency in test corpora. Purple (darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms that are misclassified at least once. A LOWESS fit line to the entire dataset (black) shows that most terms are always classified correctly. A LOWESS fit line to the misclassified points (blue, or lighter) shows that classification accuracy increases with term frequency. Access the article online to view this figure in colour.

**Figure 5**
Top 50 terms, ranked by frequency, derived for MedHelp's Arthritis forum as determined by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are shown in black. Terms occurring in both lists are linked with a line. The gradient of these lines show that all co-occurring terms, bar three, are ranked more highly by ADEPT.

See this image and copyright information in PMC

References

1. Neal L, Oakley K, Lindgaard G, et al. Online Health Communities. Proc ACM SIGCHI Conference on Human Factors in Computing Systems 2007, 2129–32
1. Ginsberg J, Mohebbi MH, Patel RS, et al. Detecting influenza epidemics using search engine query data. Nature 2008;457:1012–14 - PubMed
1. Freifeld CC, Mandl KD, Reis BY, et al. Healthmap: global infectious disease monitoring through automated classification and visualization of internet media reports. J Med Internet Res 2008;15:150–7 - PMC - PubMed
1. Carmichael A. Infertility-Asthma link confirmed. Cure Together Blog. http://curetogether.com/blog/2011/03/07/infertility-asthma-link-confirmed. Updated March 7, 2011 (accessed 12 Jan 2012)
1. Wicks P, Vaughan TE, Massagli MP, et al. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nat Biotechnol 2011;29:411–14 - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Affiliation

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials