Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 3;12 Suppl 8(Suppl 8):S4.
doi: 10.1186/1471-2105-12-S8-S4.

BioCreative III interactive task: an overview

Affiliations

BioCreative III interactive task: an overview

Cecilia N Arighi et al. BMC Bioinformatics. .

Abstract

Background: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.

Results: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.

Discussion: The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Usability and performance assessment survey results. Note that only selected questions are shown in graph format. Results are shown as number of UAG member that selected a particular response.
Figure 2
Figure 2
ODIN interface. The ODIN interface is organized in 3 panels: the inspector panel (left) is used to edit single annotations, the document panel (center) contains the document being inspected, and the annotation panel(right) contains grid views (in different tabs) of the terms, concepts and interactions identified by the system in the target document. The term tab contains columns showing the textual form of a term occurrence, its possible concept identifiers and main semantic types together with an ambiguity count. In the concept tab (called "Genes/Proteins" for this task) there is a row for each concept identifier with a relevance score, a frequency count, the most prominent text zone where the concept appears (title, abstract, text), its semantic type, and a link to allow exploration of the concept in the web interface of the ontology where it stems from.
Figure 3
Figure 3
GeneView interface. The main panel shows the article and the recognized entities. Detected gene names are highlighted in green and entity-specific information, as shown for gene ALIX (PDCD6IP), is displayed. The left panel provides an overview of all entities found in the article sorted by overall count. This ranking can be manually modified. Per default all genes are highlighted in the text, but GeneView allows to limit the highlighting to the species of interest.
Figure 4
Figure 4
IAT interface from University of Iowa. The left panel displays the full text of the article selected by the user for the purpose of gene normalization. The right panel shows a ranked list of gene and species names along with their normalized identifiers. In this figure, all instances of the user-selected gene POSH are shown to be highlighted.
Figure 5
Figure 5
GeneIR interface from University of Wisconsin. Screenshot showing the two search boxes. Results are presented as a table. Links are provided to view the genes highlighted in the article, add or delete a gene and download the gene list. List of genes can be sorted by centrality (default), presence in title and abstract, or the frequency with which they appear in the article.
Figure 6
Figure 6
GNSuite interface. A screenshot for PMC 2680910 with the “gene summary table” and “full text” tabs activated. On the left are links to the system documentation, and on the right is detailed information about the most recently clicked gene name. On the top of the screen, right under the PMC and PubMed identifier information, are tabs for the different input sub-systems for genes and species information in addition to the summary tabs and a “hide gene tables” tab. The gene table can be saved locally by clicking the provided button. On the bottom of the screen are three tabs for viewing the abstract/MEDIE or full text/GNSuite or Web-search results respectively. The selected gene and species names from the top tables are highlighted in the texts at the bottom.
Figure 7
Figure 7
MyMiner interface. MyMiner Entity tagging and Entity linking user interfaces for PMC2680910 article abstract. Entity tagging (A) and Entity linking (B) have been manually edited; some tags have been added or removed depending on the bio curator choices.

References

    1. Dowell KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA. Integrating text mining into the MGI biocuration workflow. Database. 2009. p. bap019. - PMC - PubMed
    1. Wiegers T, Davis A, Cohen KB, Hirschman L, Mattingly C. Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) BMC Bioinformatics. 2009;10(1):326. doi: 10.1186/1471-2105-10-326. - DOI - PMC - PubMed
    1. Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A. An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform. 2010;7(3):385–399. - PubMed
    1. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A. Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biology. 2008;9(Suppl 2):S1. doi: 10.1186/gb-2008-9-s2-s1. - DOI - PMC - PubMed
    1. Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics. 2005;6(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. - DOI - PMC - PubMed

Publication types