Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 6;9(11):e112034.
doi: 10.1371/journal.pone.0112034. eCollection 2014.

Prediction of interactions between viral and host proteins using supervised machine learning methods

Affiliations

Prediction of interactions between viral and host proteins using supervised machine learning methods

Ranjan Kumar Barman et al. PLoS One. .

Abstract

Background: Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs) has great implication for therapeutics.

Methods: In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques.

Results: Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49%) and Random Forest (55.66%). However the specificity of Naïve Bayes was the highest (99.52%) as compared with SVM (74%) and Random Forest (89.08%). Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively.

Conclusion: The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV), interacting partners of host protein were identified using optimised SVM model.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Hierarchical clustering of highly predicted SVM score of HBV-human protein pairs.
Hierarchical clustering analysis was done using TIBCO Spotfire software with complete linkage clustering method, cosine correlation distance measure, average value ordering weight, scale between 0 and 1 normalization and empty value replace by 0 for both (row and column) dendrogram. The high, average and low SVM predicted scores are marked in red, white and blue, respectively.
Figure 2
Figure 2. A network of HBX-human protein interactions predicted by our proposed method.
The network visualized by Cytoscape 3.0.2 . The HBX protein is represented by cyan node. The significant gene ontology enriched human proteins are representing by salmon node, whereas other human proteins are representing by slate grey node.

References

    1. Arnold R, Boonen K, Sun MG, Kim PM (2012) Computational analysis of interactomes: Current and future perspectives for bioinformatics approaches to model the host–pathogen interaction space. Methods 57: 508–518. - PMC - PubMed
    1. Zhou H, Jin J, Wong L (2013) Progress in computational studies of host-pathogen interactions. Journal of Bioinformatics and Computational Biology 11(2): 1230001 (26 pages). - PubMed
    1. Cui G, Fang C, Han K (2012) Prediction of protein-protein interactions between viruses and human by an SVM model. BMC Bioinformatics 13 Suppl 7 S5. - PMC - PubMed
    1. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644): 449–53. - PubMed
    1. Lin N, Wu B, Jansen R, Gerstein M, Zhao H (2004) Information assessment on predicting protein-protein interactions. BMC Bioinformatics 5: 154. - PMC - PubMed

Publication types

LinkOut - more resources