Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013 Nov-Dec;20(6):1120-7.
doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Affiliations
Comparative Study

Identifying medical terms in patient-authored text: a crowdsourcing-based approach

Diana Lynn MacLean et al. J Am Med Inform Assoc. 2013 Nov-Dec.

Abstract

Background and objective: As people increasingly engage in online health-seeking behavior and contribute to health-oriented websites, the volume of medical text authored by patients and other medical novices grows rapidly. However, we lack an effective method for automatically identifying medical terms in patient-authored text (PAT). We demonstrate that crowdsourcing PAT medical term identification tasks to non-experts is a viable method for creating large, accurately-labeled PAT datasets; moreover, such datasets can be used to train classifiers that outperform existing medical term identification tools.

Materials and methods: To evaluate the viability of using non-expert crowds to label PAT, we compare expert (registered nurses) and non-expert (Amazon Mechanical Turk workers; Turkers) responses to a PAT medical term identification task. Next, we build a crowd-labeled dataset comprising 10 000 sentences from MedHelp. We train two models on this dataset and evaluate their performance, as well as that of MetaMap, Open Biomedical Annotator (OBA), and NaCTeM's TerMINE, against two gold standard datasets: one from MedHelp and the other from CureTogether.

Results: When aggregated according to a corroborative voting policy, Turker responses predict expert responses with an F1 score of 84%. A conditional random field (CRF) trained on 10 000 crowd-labeled MedHelp sentences achieves an F1 score of 78% against the CureTogether gold standard, widely outperforming OBA (47%), TerMINE (43%), and MetaMap (39%). A failure analysis of the CRF suggests that misclassified terms are likely to be either generic or rare.

Conclusions: Our results show that combining statistical models sensitive to sentence-level context with crowd-labeled data is a scalable and effective technique for automatically identifying medical terms in PAT.

Keywords: crowdsourcing; medical term extraction; online health forums; text mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Patient-authored text (PAT) medical word identification task instructions and interface. Access the article online to view this figure in colour.
Figure 2
Figure 2
An illustration of our corroborative, word-level voting policy. Stopwords (like ‘of’) are excluded from the vote.
Figure 3
Figure 3
A comparison of terms identified as medically-relevant (shown in black) by different models in five sample sentences. OBA and MetaMap are run using the SNOMED CT ontology.
Figure 4
Figure 4
Term classification accuracy plotted against logged term frequency in test corpora. Purple (darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms that are misclassified at least once. A LOWESS fit line to the entire dataset (black) shows that most terms are always classified correctly. A LOWESS fit line to the misclassified points (blue, or lighter) shows that classification accuracy increases with term frequency. Access the article online to view this figure in colour.
Figure 5
Figure 5
Top 50 terms, ranked by frequency, derived for MedHelp's Arthritis forum as determined by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are shown in black. Terms occurring in both lists are linked with a line. The gradient of these lines show that all co-occurring terms, bar three, are ranked more highly by ADEPT.

References

    1. Neal L, Oakley K, Lindgaard G, et al. Online Health Communities. Proc ACM SIGCHI Conference on Human Factors in Computing Systems 2007, 2129–32
    1. Ginsberg J, Mohebbi MH, Patel RS, et al. Detecting influenza epidemics using search engine query data. Nature 2008;457:1012–14 - PubMed
    1. Freifeld CC, Mandl KD, Reis BY, et al. Healthmap: global infectious disease monitoring through automated classification and visualization of internet media reports. J Med Internet Res 2008;15:150–7 - PMC - PubMed
    1. Carmichael A. Infertility-Asthma link confirmed. Cure Together Blog. http://curetogether.com/blog/2011/03/07/infertility-asthma-link-confirmed. Updated March 7, 2011 (accessed 12 Jan 2012)
    1. Wicks P, Vaughan TE, Massagli MP, et al. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nat Biotechnol 2011;29:411–14 - PubMed

Publication types