Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1:2018:bay104.
doi: 10.1093/database/bay104.

Overview of the BioCreative VI text-mining services for Kinome Curation Track

Affiliations

Overview of the BioCreative VI text-mining services for Kinome Curation Track

Julien Gobeill et al. Database (Oxford). .

Abstract

The text-mining services for kinome curation track, part of BioCreative VI, proposed a competition to assess the effectiveness of text mining to perform literature triage. The track has exploited an unpublished curated data set from the neXtProt database. This data set contained comprehensive annotations for 300 human protein kinases. For a given protein and a given curation axis [diseases or gene ontology (GO) biological processes], participants' systems had to identify and rank relevant articles in a collection of 5.2 M MEDLINE citations (task 1) or 530 000 full-text articles (task 2). Explored strategies comprised named-entity recognition and machine-learning frameworks. For that latter approach, participants developed methods to derive a set of negative instances, as the databases typically do not store articles that were judged as irrelevant by curators. The supervised approaches proposed by the participating groups achieved significant improvements compared to the baseline established in a previous study and compared to a basic PubMed search.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of literature triage for the Kinome Track. The axis is either diseases or biologicial processes. The collection differs depending on the task, abstracts or full texts.
Figure 2
Figure 2
Results for the abstracts triage task, disease axis. The best three runs submitted by each team are presented, along with official baselines. Conditional formating is used for highlighting best participants results, for each metric, in red. The neXtA5 baseline (in bold) is included in the highlighting, while the PubMed baseline (in italic) is not.
Figure 3
Figure 3
Results for the abstracts triage task, biological process axis. The best three runs submitted by each team are presented, along with official baselines. Conditional formating is used for highlighting best participants results, for each metric, in red. The neXtA5 baseline (in bold) is included in the highlighting, while the PubMed baseline (in italic) is not.
Figure 4
Figure 4
Results for the full-text triage task, disease axis. There was only one submitting team. Conditional formating is used for highlighting best participants results, for each metric.
Figure 5
Figure 5
Results for the full-text triage task, biological process axis. There was only one submitting team. Conditional formating is used for highlighting best participants results, for each metric.

References

    1. Burge S., Attwood T.K., Bateman A. et al. (2012) Biocurators and biocuration: surveying the 21st century challenges. Database (Oxford), bar059. doi: 10.1093/database/bar059. - DOI - PMC - PubMed
    1. Gaudet P., Michel P.A., Zahn-Zabal M. et al. (2015) The neXtProt knowledgebase on human proteins: current status. Nucleic Acids Res., 43, D764–D770. doi:10.1093/nar/gku1178. - DOI - PMC - PubMed
    1. Gaudet P., Michel P.A., Zahn-Zabal M. et al. (2017) The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res., 45, D177–D182. doi: 10.1093/nar/gkw1062. - DOI - PMC - PubMed
    1. Mottin L., Pasche E., Gobeill J. et al. (2017) Triage by ranking to support the curation of protein interactions. Database (Oxford), 2017. doi: 10.1093/database/bax040. - DOI - PMC - PubMed
    1. Hirschman L., Burns G.A.P.C., Krallinger M. et al. (2012) Text mining for the biocuration workflow. Database (Oxford), bas020. doi: 10.1093/database/bas020. - DOI - PMC - PubMed

Publication types

Substances