BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
- PMID: 27074804
- PMCID: PMC4830473
- DOI: 10.1093/database/baw043
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
Abstract
Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.Database URL:http://infos.korea.ac.kr/bronco.
© The Author(s) 2016. Published by Oxford University Press.
Figures




Similar articles
-
Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts.BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):68. doi: 10.1186/s12911-016-0294-3. BMC Med Inform Decis Mak. 2016. PMID: 27454860 Free PMC article.
-
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y. J Biomed Semantics. 2016. PMID: 27216254 Free PMC article.
-
DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations.Database (Oxford). 2024 May 28;2024:baae039. doi: 10.1093/database/baae039. Database (Oxford). 2024. PMID: 38805753 Free PMC article.
-
A survey of current work in biomedical text mining.Brief Bioinform. 2005 Mar;6(1):57-71. doi: 10.1093/bib/6.1.57. Brief Bioinform. 2005. PMID: 15826357 Review.
-
Community challenges in biomedical text mining over 10 years: success, failure and the future.Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1. Brief Bioinform. 2016. PMID: 25935162 Free PMC article. Review.
Cited by
-
Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer.Sci Data. 2024 Mar 2;11(1):265. doi: 10.1038/s41597-024-03083-9. Sci Data. 2024. PMID: 38431735 Free PMC article.
-
Deep learning of mutation-gene-drug relations from the literature.BMC Bioinformatics. 2018 Jan 25;19(1):21. doi: 10.1186/s12859-018-2029-1. BMC Bioinformatics. 2018. PMID: 29368597 Free PMC article.
-
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.PLoS Comput Biol. 2016 Nov 30;12(11):e1005017. doi: 10.1371/journal.pcbi.1005017. eCollection 2016 Nov. PLoS Comput Biol. 2016. PMID: 27902695 Free PMC article.
-
Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed.Nat Biotechnol. 2018 Aug;36(7):651-659. doi: 10.1038/nbt.4152. Epub 2018 Jun 18. Nat Biotechnol. 2018. PMID: 29912209 Free PMC article.
-
RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature.Database (Oxford). 2024 Sep 12;2024:baae095. doi: 10.1093/database/baae095. Database (Oxford). 2024. PMID: 39265993 Free PMC article.
References
-
- Kumar P, Henikoff S, Ng PC. (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4(7):1073–1082. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials