Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Todd Lingren¹, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, Imre Solti

Affiliations

PMID: 24001514
PMCID: PMC3994857
DOI: 10.1136/amiajnl-2013-001837

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Todd Lingren et al. J Am Med Inform Assoc. 2014 May-Jun.

. 2014 May-Jun;21(3):406-13.

doi: 10.1136/amiajnl-2013-001837. Epub 2013 Sep 3.

Authors

Todd Lingren¹, Louise Deleger, Katalin Molnar, Haijun Zhai, Jareen Meinzen-Derr, Megan Kaiser, Laura Stoutenborough, Qi Li, Imre Solti

Affiliation

¹ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

PMID: 24001514
PMCID: PMC3994857
DOI: 10.1136/amiajnl-2013-001837

Abstract

Objective: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized.

Methods: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction.

Results: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations.

Conclusions: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.

Keywords: Information Extraction; Natural Language Processing; Pre-annotation; clinical trial announcements; named entity recognition; umls.

PubMed Disclaimer

Figures

**Figure 1**
UMLS Technology Services SNOMED-CT browser: search for lung cancer.

**Figure 2**
Sample disease/disorder and sign/symptom entities.

**Figure 3**
Pre-annotated clinical trial announcement text in Knowtator.

See this image and copyright information in PMC

References

1. Tomanek K, Wermter J, Hahn U. Efficient annotation with the jena annotation environment (JANE). Linguistic Annotation Workshop—A Merger of NLPXML 2007 and FLAC 2007; Prague, Czech Republic: Association for Computational Linguistics (ACL), 2007:9–16
1. Ganchev K, Pereira F, Mandel M, et al. Semi-automated named entity annotation. Proceedings of the Linguistic Annotation Workshop; Prague, Czech Republic: Association for Computational Linguistics, 2007:53–6
1. Aramaki E, Miura Y, Tonoike M, et al. Extraction of adverse drug effects from clinical records. Stud Health Technol Inform 2010;160(Pt 1):739–43 - PubMed
1. Ogren P, Savova G, Chute C. Constructing evaluation corpora for automated clinical named entity recognition. Proceedings of the Language Resources and Evaluation Conference (LREC); 2008:28–30
1. Hachey B, Alex B, Becker M. Investigating the effects of selective sampling on the annotation task. Proceedings of the Ninth Conference on Computational Natural Language Learning; Ann Arbor, Michigan: Association for Computational Linguistics, 2005:144–51

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

1R21HD072883-01/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Affiliation

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical