. 2016 Jul 25;17 Suppl 7(Suppl 7):246.

doi: 10.1186/s12859-016-1100-z.

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

Thi Thanh Thuy Phan¹, Takenao Ohkawa²

Affiliations

¹ Department of Information Science, Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657-8501, Japan. thuy@iip.ist.i.kyoto-u.ac.jp.
² Department of Information Science, Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657-8501, Japan.

PMID: 27454611
PMCID: PMC4965725
DOI: 10.1186/s12859-016-1100-z

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

Thi Thanh Thuy Phan et al. BMC Bioinformatics. 2016.

. 2016 Jul 25;17 Suppl 7(Suppl 7):246.

doi: 10.1186/s12859-016-1100-z.

Authors

Thi Thanh Thuy Phan¹, Takenao Ohkawa²

Affiliations

¹ Department of Information Science, Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657-8501, Japan. thuy@iip.ist.i.kyoto-u.ac.jp.
² Department of Information Science, Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada, Kobe, 657-8501, Japan.

PMID: 27454611
PMCID: PMC4965725
DOI: 10.1186/s12859-016-1100-z

Abstract

Background: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods.

Results: Our feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G).

Conclusions: Our method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features.

Keywords: Biomedical text mining; Information extraction; Protein protein interaction; k-nearest neighbors.

PubMed Disclaimer

Figures

**Fig. 1**
Example of a constituent parse tree. Constituent parse tree for sentence, *“Oxytocin stimulates IP3 production in dose-dependent fashion as well,”* from sentence IEPA.d0.s0 of IEPA corpus (first protein P1 is Oxytocin and second protein P2 is IP3)

**Fig. 2**
Framework of our PPI extraction system. Our system consists of two phases. First, training set is divided into subsets based on presence of *significant keywords* and the feature *position of keyword*. Second, after cross-validation is performed on the training data to assess the contribution levels of four groups, which consist of related features, feature selection is performed automatically through our three approaches (BEST1G, U3G, O2G). Finally, the k-NN classifier is used to classify candidate PPI pairs of test data

**Fig. 3**
Outline of PPI prediction based on division of training set. Training set was divided into subsets, A, B, and C, based on existence of *significant keyword* and feature *position of keyword*. Three classifiers were generated from every subset. Similarly, unlabeled instances were divided into one of three subsets, A’, B’, and C’, and corresponding classifier was used to identify whether PPIs exist in these instances

**Fig. 4**
S-fold cross-validation (SFCV) performed on original training data. Original training data T r a i n _all was divided into S equal-sized partitions P _i(i=0,⋯,S−1) to perform SFCV on it to estimate contribution levels of four groups, G ₁, G ₂, G ₃, and G ₄, and perform feature selection

See this image and copyright information in PMC

Cited by

HMNPPID-human malignant neoplasm protein-protein interaction database.
Li Q, Yang Z, Zhao Z, Luo L, Li Z, Wang L, Zhang Y, Lin H, Wang J, Zhang Y. Li Q, et al. Hum Genomics. 2019 Oct 22;13(Suppl 1):44. doi: 10.1186/s40246-019-0223-5. Hum Genomics. 2019. PMID: 31639057 Free PMC article.
Using a Large Margin Context-Aware Convolutional Neural Network to Automatically Extract Disease-Disease Association from Literature: Comparative Analytic Study.
Lai PT, Lu WL, Kuo TR, Chung CR, Han JC, Tsai RT, Horng JT. Lai PT, et al. JMIR Med Inform. 2019 Nov 26;7(4):e14502. doi: 10.2196/14502. JMIR Med Inform. 2019. PMID: 31769759 Free PMC article.
Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.
Murugesan G, Abdulkadhar S, Natarajan J. Murugesan G, et al. PLoS One. 2017 Nov 3;12(11):e0187379. doi: 10.1371/journal.pone.0187379. eCollection 2017. PLoS One. 2017. PMID: 29099838 Free PMC article.
Leveraging prior knowledge for protein-protein interaction extraction with memory network.
Zhou H, Liu Z, Ning S, Yang Y, Lang C, Lin Y, Ma K. Zhou H, et al. Database (Oxford). 2018 Jan 1;2018:bay071. doi: 10.1093/database/bay071. Database (Oxford). 2018. PMID: 30010731 Free PMC article.
Identification of conclusive association entities in biomedical articles.
Liu RL. Liu RL. J Biomed Semantics. 2019 Jan 7;10(1):1. doi: 10.1186/s13326-018-0194-9. J Biomed Semantics. 2019. PMID: 30616688 Free PMC article.

References

1. Liu B, Qian L, Wang H, Zhou G. Dependency-driven feature-based learning for extracting protein-protein interactions from biomedical text. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Beijing, China: 2010. p. 757–765.
1. Landeghem S, Saeys Y, Peer Y. Extracting protein-protein interactions from text using rich feature vectors and feature selection. In: Proceedings of the Third International Symposium on Semantic Mining in Biomedicine. Turku, Finland: 2008. p. 77–84.
1. Airola A, Pyysalo S, Bjorne J, Pahikkalla T, Ginter F, Salakoski T. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics. 2008;9(Suppl 11):S2. doi: 10.1186/1471-2105-9-S11-S2. - DOI - PMC - PubMed
1. Miwa M, Sætre R, Miyao Y, Tsujii J. Protein-protein interaction extraction by leveraging multiple kernels and parsers. Int J Med Inf. 2009;78(12):e39–e46. doi: 10.1016/j.ijmedinf.2009.04.010. - DOI - PubMed
1. Qian L, Zhou G. Tree kernel-based protein-protein interaction extraction from biomedical literature. J. Biomed. Inf. 2012;45(3):535–543. doi: 10.1016/j.jbi.2012.02.004. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

Affiliations

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources