Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 13:2016:baw043.
doi: 10.1093/database/baw043. Print 2016.

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Affiliations

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Kyubum Lee et al. Database (Oxford). .

Abstract

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.Database URL:http://infos.korea.ac.kr/bronco.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Manual curation of BRONCO. (a) Workflow of manual curation. (b) Example of manual curation.
Figure 2.
Figure 2.
Workflow for assessing the performance of MF, EMU and tmVar in this study.
Figure 3.
Figure 3.
The post-processing module’s performance on the BRONCO corpus.
Figure 4.
Figure 4.
Examples of true and false positives identified by the three methods.

Similar articles

Cited by

References

    1. Weinstein JN, Collisson EA, Mills GB, et al. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45(10):1113–1120. - PMC - PubMed
    1. Zhang J, Baran J, Cros A, et al. (2011) International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database: The Journal of Biological Databases and Curation 2011:bar026. - PMC - PubMed
    1. Kumar P, Henikoff S, Ng PC. (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4(7):1073–1082. - PubMed
    1. Adzhubei IA, Schmidt S, Peshkin L, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249. - PMC - PubMed
    1. Sherry ST, Ward MH, Kholodov M, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311. - PMC - PubMed

Publication types