Mining literature for protein-protein interactions
- PMID: 11301305
- DOI: 10.1093/bioinformatics/17.4.359
Mining literature for protein-protein interactions
Abstract
Motivation: A central problem in bioinformatics is how to capture information from the vast current scientific literature in a form suitable for analysis by computer. We address the special case of information on protein-protein interactions, and show that the frequencies of words in Medline abstracts can be used to determine whether or not a given paper discusses protein-protein interactions. For those papers determined to discuss this topic, the relevant information can be captured for the Database of Interacting PROTEINS: Furthermore, suitable gene annotations can also be captured.
Results: Our Bayesian approach scores Medline abstracts for probability of discussing the topic of interest according to the frequencies of discriminating words found in the abstract. More than 80 discriminating words (e.g. complex, interaction, two-hybrid) were determined from a training set of 260 Medline abstracts corresponding to previously validated entries in the Database of Interacting Proteins. Using these words and a log likelihood scoring function, approximately 2000 Medline abstracts were identified as describing interactions between yeast proteins. This approach now forms the basis for the rapid expansion of the Database of Interacting Proteins.
Similar articles
-
Finding relevant references to genes and proteins in Medline using a Bayesian approach.Bioinformatics. 2002 Nov;18(11):1515-22. doi: 10.1093/bioinformatics/18.11.1515. Bioinformatics. 2002. PMID: 12424124
-
Ranking the whole MEDLINE database according to a large training set using text indexing.BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75. BMC Bioinformatics. 2005. PMID: 15790421 Free PMC article.
-
Information content in Medline record fields.Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008. Int J Med Inform. 2004. PMID: 15171980
-
Bayesian methods in health technology assessment: a review.Health Technol Assess. 2000;4(38):1-130. Health Technol Assess. 2000. PMID: 11134920 Review.
-
Linking entries in protein interaction database to structured text: the FEBS Letters experiment.FEBS Lett. 2008 Apr 9;582(8):1171-7. doi: 10.1016/j.febslet.2008.02.071. Epub 2008 Mar 6. FEBS Lett. 2008. PMID: 18328820 Review.
Cited by
-
Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.Genome Biol. 2005;6(5):R40. doi: 10.1186/gb-2005-6-5-r40. Epub 2005 Apr 15. Genome Biol. 2005. PMID: 15892868 Free PMC article.
-
Genome-wide functional association networks: background, data & state-of-the-art resources.Brief Bioinform. 2020 Jul 15;21(4):1224-1237. doi: 10.1093/bib/bbz064. Brief Bioinform. 2020. PMID: 31281921 Free PMC article. Review.
-
On the detection of functionally coherent groups of protein domains with an extension to protein annotation.BMC Bioinformatics. 2007 Oct 16;8:390. doi: 10.1186/1471-2105-8-390. BMC Bioinformatics. 2007. PMID: 17937820 Free PMC article.
-
Biomedical term mapping databases.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D289-93. doi: 10.1093/nar/gki137. Nucleic Acids Res. 2005. PMID: 15608198 Free PMC article.
-
Protein interaction sentence detection using multiple semantic kernels.J Biomed Semantics. 2011 May 14;2(1):1. doi: 10.1186/2041-1480-2-1. J Biomed Semantics. 2011. PMID: 21569604 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases