Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
- PMID: 26994911
- PMCID: PMC4799720
- DOI: 10.1093/database/baw032
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
Abstract
Manually curating chemicals, diseases and their relationships is significantly important to biomedical research, but it is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest in developing computational approaches for automatic chemical-disease relation (CDR) extraction. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state-of-the-art. To this end, we organized a challenge task through BioCreative V to automatically extract CDRs from the literature. We designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consisted of human annotations of chemicals, diseases and their interactions from 1500 PubMed articles. 34 teams worldwide participated in the CDR task: 16 (DNER) and 18 (CID). The best systems achieved an F-score of 86.46% for the DNER task--a result that approaches the human inter-annotator agreement (0.8875)--and an F-score of 57.03% for the CID task, the highest results ever reported for such tasks. When combining team results via machine learning, the ensemble system was able to further improve over the best team results by achieving 88.89% and 62.80% in F-score for the DNER and CID task, respectively. Additionally, another novel aspect of our evaluation is to test each participating system's ability to return real-time results: the average response time for each team's DNER and CID web service systems were 5.6 and 9.3 s, respectively. Most teams used hybrid systems for their submissions based on machining learning. Given the level of participation and results, we found our task to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.
Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Figures




Similar articles
-
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.Database (Oxford). 2016 May 9;2016:baw068. doi: 10.1093/database/baw068. Print 2016. Database (Oxford). 2016. PMID: 27161011 Free PMC article.
-
HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V.Database (Oxford). 2016 Jun 5;2016:baw077. doi: 10.1093/database/baw077. Print 2016. Database (Oxford). 2016. PMID: 27270713 Free PMC article.
-
Extraction of chemical-induced diseases using prior knowledge and textual information.Database (Oxford). 2016 Apr 14;2016:baw046. doi: 10.1093/database/baw046. Print 2016. Database (Oxford). 2016. PMID: 27081155 Free PMC article.
-
BioC interoperability track overview.Database (Oxford). 2014 Jun 30;2014:bau053. doi: 10.1093/database/bau053. Print 2014. Database (Oxford). 2014. PMID: 24980129 Free PMC article. Review.
-
Text Mining for Building Biomedical Networks Using Cancer as a Case Study.Biomolecules. 2021 Sep 29;11(10):1430. doi: 10.3390/biom11101430. Biomolecules. 2021. PMID: 34680062 Free PMC article. Review.
Cited by
-
Ontology-driven weak supervision for clinical entity classification in electronic health records.Nat Commun. 2021 Apr 1;12(1):2017. doi: 10.1038/s41467-021-22328-4. Nat Commun. 2021. PMID: 33795682 Free PMC article.
-
Transformer-based approach to variable typing.Heliyon. 2023 Sep 29;9(10):e20505. doi: 10.1016/j.heliyon.2023.e20505. eCollection 2023 Oct. Heliyon. 2023. PMID: 37842594 Free PMC article.
-
A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR.J Biomed Inform. 2020 Oct;110:103541. doi: 10.1016/j.jbi.2020.103541. Epub 2020 Aug 16. J Biomed Inform. 2020. PMID: 32814201 Free PMC article.
-
MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes.Drug Saf. 2019 Jan;42(1):123-133. doi: 10.1007/s40264-018-0761-0. Drug Saf. 2019. PMID: 30600484 Free PMC article.
-
Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study.JMIR Med Inform. 2020 May 29;8(5):e17644. doi: 10.2196/17644. JMIR Med Inform. 2020. PMID: 32469325 Free PMC article.
References
-
- Hurle M.R., Yang L., Xie Q. et al.. (2013) Computational drug repositioning: from data to therapeutics. Clin. Pharmacol. Ther., 93, 335–341. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources