Accelerated variant curation from scientific literature using biomedical text mining
- PMID: 35663412
- PMCID: PMC9160977
- DOI: 10.17912/micropub.biology.000578
Accelerated variant curation from scientific literature using biomedical text mining
Abstract
Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have developed a hybrid approach that combines regular expressions, Named Entity Recognition based on BERT (Bidirectional Encoder Representations from Transformers) and bag-of-words to extract variant genomic locations from C. elegans papers for WormBase. Our model has a precision of 82.59% for the gene-mutation matches tested on extracted text from 100 papers, and even recovers some data not discovered during manual curation. Code at: https://github.com/WormBase/genomic-info-from-papers.
Copyright: © 2022 by the authors.
Figures

References
Grants and funding
LinkOut - more resources
Full Text Sources