. 2016 Mar 19:2016:baw032.

doi: 10.1093/database/baw032. Print 2016.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

Chih-Hsuan Wei¹, Yifan Peng², Robert Leaman¹, Allan Peter Davis³, Carolyn J Mattingly³, Jiao Li⁴, Thomas C Wiegers³, Zhiyong Lu⁵

Affiliations

¹ National Center for Biotechnology Information, Bethesda, MD 20894, USA.
² National Center for Biotechnology Information, Bethesda, MD 20894, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA.
³ Department of Biological Sciences and the Center for Human Health and the Environment, North Carolina State University, Raleigh, NC 27695, USA and.
⁴ Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100700, China.
⁵ National Center for Biotechnology Information, Bethesda, MD 20894, USA zhiyong.lu@nih.gov.

PMID: 26994911
PMCID: PMC4799720
DOI: 10.1093/database/baw032

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

Chih-Hsuan Wei et al. Database (Oxford). 2016.

. 2016 Mar 19:2016:baw032.

doi: 10.1093/database/baw032. Print 2016.

Authors

Chih-Hsuan Wei¹, Yifan Peng², Robert Leaman¹, Allan Peter Davis³, Carolyn J Mattingly³, Jiao Li⁴, Thomas C Wiegers³, Zhiyong Lu⁵

Affiliations

¹ National Center for Biotechnology Information, Bethesda, MD 20894, USA.
² National Center for Biotechnology Information, Bethesda, MD 20894, USA Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA.
³ Department of Biological Sciences and the Center for Human Health and the Environment, North Carolina State University, Raleigh, NC 27695, USA and.
⁴ Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100700, China.
⁵ National Center for Biotechnology Information, Bethesda, MD 20894, USA zhiyong.lu@nih.gov.

PMID: 26994911
PMCID: PMC4799720
DOI: 10.1093/database/baw032

Abstract

Manually curating chemicals, diseases and their relationships is significantly important to biomedical research, but it is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest in developing computational approaches for automatic chemical-disease relation (CDR) extraction. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state-of-the-art. To this end, we organized a challenge task through BioCreative V to automatically extract CDRs from the literature. We designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consisted of human annotations of chemicals, diseases and their interactions from 1500 PubMed articles. 34 teams worldwide participated in the CDR task: 16 (DNER) and 18 (CID). The best systems achieved an F-score of 86.46% for the DNER task--a result that approaches the human inter-annotator agreement (0.8875)--and an F-score of 57.03% for the CID task, the highest results ever reported for such tasks. When combining team results via machine learning, the ensemble system was able to further improve over the best team results by achieving 88.89% and 62.80% in F-score for the DNER and CID task, respectively. Additionally, another novel aspect of our evaluation is to test each participating system's ability to return real-time results: the average response time for each team's DNER and CID web service systems were 5.6 and 9.3 s, respectively. Most teams used hybrid systems for their submissions based on machining learning. Given the level of participation and results, we found our task to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.

Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

PubMed Disclaimer

Figures

**Figure 1.**
The pipeline of the task workflow. The task organization is shown in purple; corpus development is shown in green; and team participation is shown in red.

**Figure 2.**
DNER results of all teams as well as the baseline (dictionary look up) and DNorm systems.

**Figure 3.**
CID results of all teams as well as two variants of the co-occurrence baseline method (i.e. abstract- and sentence-level).

**Figure 4.**
Average response time of each individual team for DNER and CID tasks.

See this image and copyright information in PMC

Cited by

Ontology-driven weak supervision for clinical entity classification in electronic health records.
Fries JA, Steinberg E, Khattar S, Fleming SL, Posada J, Callahan A, Shah NH. Fries JA, et al. Nat Commun. 2021 Apr 1;12(1):2017. doi: 10.1038/s41467-021-22328-4. Nat Commun. 2021. PMID: 33795682 Free PMC article.
Transformer-based approach to variable typing.
Rey CA, Danguilan JL, Mendoza KP, Remolona MF. Rey CA, et al. Heliyon. 2023 Sep 29;9(10):e20505. doi: 10.1016/j.heliyon.2023.e20505. eCollection 2023 Oct. Heliyon. 2023. PMID: 37842594 Free PMC article.
A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR.
Peterson KJ, Jiang G, Liu H. Peterson KJ, et al. J Biomed Inform. 2020 Oct;110:103541. doi: 10.1016/j.jbi.2020.103541. Epub 2020 Aug 16. J Biomed Inform. 2020. PMID: 32814201 Free PMC article.
MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes.
Yang X, Bian J, Gong Y, Hogan WR, Wu Y. Yang X, et al. Drug Saf. 2019 Jan;42(1):123-133. doi: 10.1007/s40264-018-0761-0. Drug Saf. 2019. PMID: 30600484 Free PMC article.
Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study.
Liu X, Fan J, Dong S. Liu X, et al. JMIR Med Inform. 2020 May 29;8(5):e17644. doi: 10.2196/17644. JMIR Med Inform. 2020. PMID: 32469325 Free PMC article.

See all "Cited by" articles

References

1. Doğan R.I., Murray G.C., Névéol A. et al.. (2009) Understanding PubMed user search behavior through log analysis. Database, 2009, bap018, 1–18. - PMC - PubMed
1. Névéol A., Doğan R.I., Lu Z. (2011) Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. J. Biomed. Inform., 44, 310–318. - PMC - PubMed
1. Li J., Zheng S., Chen B. et al.. (2016) A survey of current trends in computational drug repositioning. Brief. Bioinform., 1, 2–12. - PMC - PubMed
1. Hurle M.R., Yang L., Xie Q. et al.. (2013) Computational drug repositioning: from data to therapeutics. Clin. Pharmacol. Ther., 93, 335–341. - PubMed
1. Davis A.P., Grondin C.J., Lennon-Hopkins K. et al.. (2015) The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res., 43, D914–D920. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

Affiliations

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources