A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

doi:10.1186/1471-2105-14-12

. 2013 Jan 16:14:12.

doi: 10.1186/1471-2105-14-12.

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Domonkos Tikk¹, Illés Solt, Philippe Thomas, Ulf Leser

Affiliations

PMID: 23323857
PMCID: PMC3680070
DOI: 10.1186/1471-2105-14-12

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Domonkos Tikk et al. BMC Bioinformatics. 2013.

. 2013 Jan 16:14:12.

doi: 10.1186/1471-2105-14-12.

Authors

Domonkos Tikk¹, Illés Solt, Philippe Thomas, Ulf Leser

Affiliation

¹ Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany. tikk@informatik.hu-berlin.de

PMID: 23323857
PMCID: PMC3680070
DOI: 10.1186/1471-2105-14-12

Abstract

Background: Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level.

Results: We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance.

Conclusions: Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.

PubMed Disclaimer

Figures

**Figure 1**
**The distribution of pairs according to classification success level using cross-validation setting.** The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly (success level) aggregated across the 5 corpora in cross-validation setting. Detailed data for each corpus can be find in Table 1. All 13 kernels are taken into consideration.

**Figure 2**
**The distribution of pairs according to classification success level using cross-learning setting.** The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly (success level) aggregated across the 5 corpora in cross-learning setting. Detailed data for each corpus can be find in Table 2. All kernels except for the very slow PT kernel are taken into consideration.

**Figure 3**
**Heatmap of success level correlation in CV and CL evaluations.** Correlation ranges from 2 (cyan) through 63 (white) to 1266 (magenta) pairs. Hues are on logarithmic scale.

**Figure 4**
**Characteristics of pairs by difficulty class.** Characteristics of pairs by difficulty class (average sentence length in words, average word distance between entities, average distance in the dependency graph (DG) and syntax tree (ST) shortest path). ND – negative difficult, NN – negative neutral, NE – negative easy, PD – positive difficult, PN – positive neutral, PE – positive easy.

**Figure 5**
The number of positive and negative pairs vs. the length of the sentence containing the pair.

**Figure 6**
The positive ground truth rate vs. the length of the sentence containing the pair.

**Figure 7**
Class distribution of pairs depending on the number of proteins in the sentence.

**Figure 8**
**Similarity of kernels as dendrogram and heat map.** Colors below the dendrogram indicate the parsing information used by a kernel. Similarity of kernel outputs ranges from full agreement (red) to 33% disagreement (yellow) on the five benchmark corpora. Clustering is performed with R’s hclust (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html).

**Figure 9**
**Comparison of some non-kernel based and kernel based classifiers in terms of F-score (CV evaluation).** The first 9 are non-kernel based classifiers, the last four are kernel based classifiers.

See this image and copyright information in PMC

Cited by

Extracting drug-enzyme relation from literature as evidence for drug drug interaction.
Zhang Y, Wu HY, Du J, Xu J, Wang J, Tao C, Li L, Xu H. Zhang Y, et al. J Biomed Semantics. 2016 Mar 7;7:11. doi: 10.1186/s13326-016-0052-6. eCollection 2016. J Biomed Semantics. 2016. PMID: 26955465 Free PMC article.
PEDL: extracting protein-protein associations using deep language models and distant supervision.
Weber L, Thobe K, Migueles Lozano OA, Wolf J, Leser U. Weber L, et al. Bioinformatics. 2020 Jul 1;36(Suppl_1):i490-i498. doi: 10.1093/bioinformatics/btaa430. Bioinformatics. 2020. PMID: 32657389 Free PMC article.
Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.
Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty KA, Dehan E, Parikh B. Xu J, et al. Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22. Hum Genet. 2019. PMID: 30671672 Free PMC article. Review.
Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features.
Thuy Phan TT, Ohkawa T. Thuy Phan TT, et al. BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):246. doi: 10.1186/s12859-016-1100-z. BMC Bioinformatics. 2016. PMID: 27454611 Free PMC article.
Automated recognition of functional compound-protein relationships in literature.
Döring K, Qaseem A, Becer M, Li J, Mishra P, Gao M, Kirchner P, Sauter F, Telukunta KK, Moumbock AFA, Thomas P, Günther S. Döring K, et al. PLoS One. 2020 Mar 3;15(3):e0220925. doi: 10.1371/journal.pone.0220925. eCollection 2020. PLoS One. 2020. PMID: 32126064 Free PMC article.

See all "Cited by" articles

References

1. Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol. 1999;7:60–67. - PubMed
1. Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein–protein interactions from the biological literature. Bioinformatics. 2001;17(2):155. doi: 10.1093/bioinformatics/17.2.155. - DOI - PubMed
1. Marcotte EM, Xenarios I, Eisenberg D. Mining literature for protein–protein interactions. Bioinformatics. 2001;17(4):359. doi: 10.1093/bioinformatics/17.4.359. - DOI - PubMed
1. Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M. Discovering patterns to extract protein–protein interactions from full texts. Bioinformatics. 2004;20(18):3604. doi: 10.1093/bioinformatics/bth451. - DOI - PubMed
1. Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief Bioinformatics. 2005;6:57. doi: 10.1093/bib/6.1.57. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol. 1999;7:60–67. - PubMed

[2] Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol. 1999;7:60–67. - PubMed

[3] Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein–protein interactions from the biological literature. Bioinformatics. 2001;17(2):155. doi: 10.1093/bioinformatics/17.2.155. - DOI - PubMed

[4] Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein–protein interactions from the biological literature. Bioinformatics. 2001;17(2):155. doi: 10.1093/bioinformatics/17.2.155. - DOI - PubMed

[5] Marcotte EM, Xenarios I, Eisenberg D. Mining literature for protein–protein interactions. Bioinformatics. 2001;17(4):359. doi: 10.1093/bioinformatics/17.4.359. - DOI - PubMed

[6] Marcotte EM, Xenarios I, Eisenberg D. Mining literature for protein–protein interactions. Bioinformatics. 2001;17(4):359. doi: 10.1093/bioinformatics/17.4.359. - DOI - PubMed

[7] Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M. Discovering patterns to extract protein–protein interactions from full texts. Bioinformatics. 2004;20(18):3604. doi: 10.1093/bioinformatics/bth451. - DOI - PubMed

[8] Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M. Discovering patterns to extract protein–protein interactions from full texts. Bioinformatics. 2004;20(18):3604. doi: 10.1093/bioinformatics/bth451. - DOI - PubMed

[9] Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief Bioinformatics. 2005;6:57. doi: 10.1093/bib/6.1.57. - DOI - PubMed

[10] Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Brief Bioinformatics. 2005;6:57. doi: 10.1093/bib/6.1.57. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Affiliation

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources