Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;9 Suppl 2(Suppl 2):S5.
doi: 10.1186/gb-2008-9-s2-s5. Epub 2008 Sep 1.

MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data

Affiliations

MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data

Andrew Chatr-aryamontri et al. Genome Biol. 2008.

Abstract

Background: In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.

Results: To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.

Conclusion: The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of the PSI-MI CV in OLS. CV, controlled vocabulary; MI, Molecular Interactions; OLS, Ontology Lookup Service; PSI, Proteomics Standards Initiative.
Figure 2
Figure 2
Interaction type in PSI-MI. MI, Molecular Interactions; PSI, Proteomics Standards Initiative.

References

    1. MINT http://mint.bio.uniroma2.it/mint/
    1. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G. MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007;35:D572–D574. doi: 10.1093/nar/gkl950. - DOI - PMC - PubMed
    1. IntAct http://www.ebi.ac.uk/intact
    1. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. IntAct: open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–D565. doi: 10.1093/nar/gkl958. - DOI - PMC - PubMed
    1. IMEx http://imex.sourceforge.net/

Publication types

MeSH terms

LinkOut - more resources