Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1;27(1):3-12.
doi: 10.1093/jamia/ocz166.

2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records

Affiliations

2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records

Sam Henry et al. J Am Med Inform Assoc. .

Abstract

Objective: This article summarizes the preparation, organization, evaluation, and results of Track 2 of the 2018 National NLP Clinical Challenges shared task. Track 2 focused on extraction of adverse drug events (ADEs) from clinical records and evaluated 3 tasks: concept extraction, relation classification, and end-to-end systems. We perform an analysis of the results to identify the state of the art in these tasks, learn from it, and build on it.

Materials and methods: For all tasks, teams were given raw text of narrative discharge summaries, and in all the tasks, participants proposed deep learning-based methods with hand-designed features. In the concept extraction task, participants used sequence labelling models (bidirectional long short-term memory being the most popular), whereas in the relation classification task, they also experimented with instance-based classifiers (namely support vector machines and rules). Ensemble methods were also popular.

Results: A total of 28 teams participated in task 1, with 21 teams in tasks 2 and 3. The best performing systems set a high performance bar with F1 scores of 0.9418 for concept extraction, 0.9630 for relation classification, and 0.8905 for end-to-end. However, the results were much lower for concepts and relations of Reasons and ADEs. These were often missed because local context is insufficient to identify them.

Conclusions: This challenge shows that clinical concept extraction and relation classification systems have a high performance for many concept types, but significant improvement is still required for ADEs and Reasons. Incorporating the larger context or outside knowledge will likely improve the performance of future systems.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Lenient micro-averaged F1 scores of each concept type for the top-performing teams. The overall micro F1 score is shown next the team name. ADE: adverse drug event.
Figure 2.
Figure 2.
Percentage of predictions for each concept type (row) that were of each true type (column). ADE: adverse drug event.
Figure 3.
Figure 3.
Lenient micro-averaged F1 score of each relation type for the top-performing teams. The overall micro F1 score is shown next the team name. ADE: adverse drug event; BCH: Boston Children's Hospital/Harvard Medical School/Loyola University; CCH: Cincinnati Children's Hospital Medical Center; IBM: IBM Research; MSC: Medical University of South Carolina; NaCT: NaCTeM at University of Manchester/Toyota Technological Institute/AIST; UFL: University of Florida; UM: University of Michigan; UTH: UTHealth/Dalian; VA: VA Salt Lake City/University of Utah.
Figure 4.
Figure 4.
Lenient micro-averaged F1 score of each relation type for the top-performing end-to-end teams. The overall micro F1 score is shown next the team name. ADE: adverse drug event; BCH: Boston Children's Hospital/Harvard Medical School/Loyola University; CCH: Cincinnati Children's Hospital Medical Center; IBM: IBM Research; MSC: Medical University of South Carolina; NaCT: NaCTeM at University of Manchester/Toyota Technological Institute/AIST; UFL: University of Florida; UM: University of Michigan; UTH: UTHealth/Dalian; VA: VA Salt Lake City/University of Utah.

References

    1. Stubbs A, Filannino M, Uzuner Ö.. De-identification of psychiatric intake records: overview of 2016 CEGS N-GRID shared tasks track 1. J Biomed Inform 2017; 75: S4–18. - PMC - PubMed
    1. Filannino M, Stubbs A, Uzuner Ö.. Symptom severity prediction from neuropsychiatric clinical records: overview of 2016 CEGS N-GRID shared tasks track 2. J Biomed Inform 2017; 75: S62–70. - PMC - PubMed
    1. Stubbs A, Kotfila C, Xu H, Uzuner Ö.. Identifying risk factors for heart disease over time: overview of 2014 i2b2/UTHealth shared task Track 2. J Biomed Inform 2015; 58: S67–77. - PMC - PubMed
    1. Stubbs A, Kotfila C, Uzuner Ö.. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform 2015; 58: S11–9. - PMC - PubMed
    1. Stubbs A, Uzuner Ö.. Annotating risk factors for heart disease in clinical narratives for diabetic patients. J Biomed Inform 2015; 58: S78–91. - PMC - PubMed

Publication types

MeSH terms