Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul 1;36(Web Server issue):W411-5.
doi: 10.1093/nar/gkn281. Epub 2008 May 28.

PIE: an online prediction system for protein-protein interactions from text

Affiliations

PIE: an online prediction system for protein-protein interactions from text

Sun Kim et al. Nucleic Acids Res. .

Abstract

Protein-protein interaction (PPI) extraction has been an important research topic in bio-text mining area, since the PPI information is critical for understanding biological processes. However, there are very few open systems available on the Web and most of the systems focus on keyword searching based on predefined PPIs. PIE (Protein Interaction information Extraction system) is a configurable Web service to extract PPIs from literature, including user-provided papers as well as PubMed articles. After providing abstracts or papers, the prediction results are displayed in an easily readable form with essential, yet compact features. The PIE interface supports more features such as PDF file extraction, PubMed search tool and network communication, which are useful for biologists and bio-system developers. The PIE system utilizes natural language processing techniques and machine learning methodologies to predict PPI sentences, which results in high precision performance for Web users. PIE is freely available at http://bi.snu.ac.kr/pie/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of PIE. The PIE system consists of several modules. ‘Article Filter’ and ‘Sentence Filter’ decide whether given articles or sentences contain PPI information. ‘Search Engine’ retrieves the stored information such as learning data (Article DB) and protein names (Protein DB). ‘Interaction DB’ means the database including interaction-related words. ‘XML–RPC Module’ is responsible for RPC communication with other PPI services. ‘Web Interface Module’ manages the whole process of PPI predictions from Web users. Prediction results contain the links to the iHOP service to provide further protein information. For PubMed search, PIE retrieves PubMed articles using the NCBI E-Utilities.
Figure 2.
Figure 2.
An example of PIE prediction results. PIE provides a user-friendly and intuitive interface. (A) Input. Web users can upload papers as a file or copy and paste text. A PubMed tool is provided for PubMed article searches. PIE allows multiple PubMed articles for PPI prediction in two ways, manual selection and automatic selection. (B) PubMed search. The article search using PubMed service is available for common use. The search results can be narrowed by the options such as number of results, published years and published journals. The ‘I'm; feeling lucky’ button is for the automatic article selection, which does similar jobs as common PPI extraction tools do. (C) Output. Prediction results are listed in the center box, highlighting PPI sentences based on their probabilities. Colors of sentences represent their probabilities: ‘Red’ for high probability and ‘Green’ for moderate probability. According to the protein DB and the interaction DB, protein names and interaction-related words are indicated by bold and italic fonts, respectively. In particular, protein names are linked to the iHOP service for providing further information. Users can leave feedback to update PIE performance by selecting a ‘No Feedback,’ ‘Agree,’ ‘Partly Disagree’ or ‘Disagree’ button.
Figure 3.
Figure 3.
ROC curves for test data. Performance of PIE has been measured using independent test sets. The options on PIE was set to using simplified tags and protein dictionary. In all cases, TPR is rapidly increased at low FPR, implying that the system performs high precision predictions for high-probability sentences.

Similar articles

Cited by

References

    1. Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief. Bioinform. 2005;6:57–71. - PubMed
    1. Cases I, Pisano D, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. Nucleic Acids Res. 2007;35:W16–W20. - PMC - PubMed
    1. Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol. 2005;6:224. - PMC - PubMed
    1. Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004;5:147. - PMC - PubMed
    1. Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 2006;7:119–129. - PubMed

Publication types