Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep-Oct;20(5):915-21.
doi: 10.1136/amiajnl-2012-001487. Epub 2012 Dec 25.

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

Affiliations

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

Qi Li et al. J Am Med Inform Assoc. 2013 Sep-Oct.

Abstract

Objective: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora.

Data and methods: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard.

Results: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.

Keywords: attribute linkages; clinical notes; clinical trial announcements; multi-layered sequence labeling; natural language processing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.
Figure 2
Figure 2
Features for binary classification-based linkage detection.
Figure 3
Figure 3
Examples of representative medication–attribute linkages using the multi-layered sequence labeling model. Access the article online to view this figure in colour.
Figure 4
Figure 4
Features for the multi-layered sequence labeling.
Figure 5
Figure 5
Evaluation measures.

Similar articles

Cited by

References

    1. Winer E, Gralow J, Diller L, et al. Clinical cancer advances 2008: major research advances in cancer treatment, prevention, and screening—a report from the American Society of Clinical Oncology. J Clin Oncol, 2009;27:812–26 - PMC - PubMed
    1. Clinical Trial Facts & Figures: General Patient Recruitment Information, The Center for Information and Study on Clinical Research Participation (CISCRP), http://www.ciscrp.org/professional/facts_pat.html#5 (accessed 18 Jul 2012).
    1. Tassignon JP, Sinackevich N. Speeding the critical path. Applied Clinical Trials, 2004
    1. Uzuner Ö, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc 2010;17:514–18 - PMC - PubMed
    1. Uzuner Ö, South B, Shen S, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011;18:552–6 - PMC - PubMed

Publication types

Substances