PIE: an online prediction system for protein-protein interactions from text

Sun Kim¹, Soo-Yong Shin, In-Hee Lee, Soo-Jin Kim, Ram Sriram, Byoung-Tak Zhang

Affiliations

PMID: 18508809
PMCID: PMC2447724
DOI: 10.1093/nar/gkn281

PIE: an online prediction system for protein-protein interactions from text

Sun Kim et al. Nucleic Acids Res. 2008.

. 2008 Jul 1;36(Web Server issue):W411-5.

doi: 10.1093/nar/gkn281. Epub 2008 May 28.

Authors

Sun Kim¹, Soo-Yong Shin, In-Hee Lee, Soo-Jin Kim, Ram Sriram, Byoung-Tak Zhang

Affiliation

¹ Biointelligence Laboratory, School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea.

PMID: 18508809
PMCID: PMC2447724
DOI: 10.1093/nar/gkn281

Abstract

Protein-protein interaction (PPI) extraction has been an important research topic in bio-text mining area, since the PPI information is critical for understanding biological processes. However, there are very few open systems available on the Web and most of the systems focus on keyword searching based on predefined PPIs. PIE (Protein Interaction information Extraction system) is a configurable Web service to extract PPIs from literature, including user-provided papers as well as PubMed articles. After providing abstracts or papers, the prediction results are displayed in an easily readable form with essential, yet compact features. The PIE interface supports more features such as PDF file extraction, PubMed search tool and network communication, which are useful for biologists and bio-system developers. The PIE system utilizes natural language processing techniques and machine learning methodologies to predict PPI sentences, which results in high precision performance for Web users. PIE is freely available at http://bi.snu.ac.kr/pie/.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of PIE. The PIE system consists of several modules. ‘Article Filter’ and ‘Sentence Filter’ decide whether given articles or sentences contain PPI information. ‘Search Engine’ retrieves the stored information such as learning data (Article DB) and protein names (Protein DB). ‘Interaction DB’ means the database including interaction-related words. ‘XML–RPC Module’ is responsible for RPC communication with other PPI services. ‘Web Interface Module’ manages the whole process of PPI predictions from Web users. Prediction results contain the links to the iHOP service to provide further protein information. For PubMed search, PIE retrieves PubMed articles using the NCBI E-Utilities.

**Figure 2.**
An example of PIE prediction results. PIE provides a user-friendly and intuitive interface. (A) Input. Web users can upload papers as a file or copy and paste text. A PubMed tool is provided for PubMed article searches. PIE allows multiple PubMed articles for PPI prediction in two ways, manual selection and automatic selection. (B) PubMed search. The article search using PubMed service is available for common use. The search results can be narrowed by the options such as number of results, published years and published journals. The ‘I'm; feeling lucky’ button is for the automatic article selection, which does similar jobs as common PPI extraction tools do. (C) Output. Prediction results are listed in the center box, highlighting PPI sentences based on their probabilities. Colors of sentences represent their probabilities: ‘Red’ for high probability and ‘Green’ for moderate probability. According to the protein DB and the interaction DB, protein names and interaction-related words are indicated by bold and italic fonts, respectively. In particular, protein names are linked to the iHOP service for providing further information. Users can leave feedback to update PIE performance by selecting a ‘No Feedback,’ ‘Agree,’ ‘Partly Disagree’ or ‘Disagree’ button.

**Figure 3.**
ROC curves for test data. Performance of PIE has been measured using independent test sets. The options on PIE was set to using simplified tags and protein dictionary. In all cases, TPR is rapidly increased at low FPR, implying that the system performs high precision predictions for high-probability sentences.

See this image and copyright information in PMC

References

1. Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief. Bioinform. 2005;6:57–71. - PubMed
1. Cases I, Pisano D, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. Nucleic Acids Res. 2007;35:W16–W20. - PMC - PubMed
1. Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol. 2005;6:224. - PMC - PubMed
1. Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004;5:147. - PMC - PubMed
1. Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 2006;7:119–129. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PIE: an online prediction system for protein-protein interactions from text

Affiliation

PIE: an online prediction system for protein-protein interactions from text

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources