Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 8;9(5):e96984.
doi: 10.1371/journal.pone.0096984. eCollection 2014.

Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein

Affiliations

Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein

Mansour Ebrahimi et al. PLoS One. .

Erratum in

  • PLoS One. 2014;9(6):e99921

Abstract

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Decision Tree from Decision Stump model ran with Gini Index criterion.
As may be inferred from the figure, the count of Tyr was the most important and the sole protein attributes in distinguishing various HA subtypes of influenza virus A. When the value for this feature was equal to 26, 27, 28, 29; if the value was equal to 18, 19, 20, 21 or 22, the virus belonged to the H3 class. While the count of Tyr was equal to 17, the subtype of the virus was H4; but when the value was equal to 23 or 24, the virus was associated with H5. H6 virus subtype was identified when the count of Tyr was equal to 26. Finally, when the value was equal to 13, 14 or 15, the virus fell into the H7 class. Underneath, the host species for each virus class has been depicted.
Figure 2
Figure 2. Decision Tree from Random Tree model ran with Gini Index criterion.
As may be inferred from the figure, the frequency of Pro - Gly was the most important protein attributes to build the tree and the counts or the frequencies of other dipeptides used to generate the tree branches and to distinguish various HA subtypes of influenza virus A. With the defined valuse for the count of Phe – Met, the count of Asn – Met and the frequency of Trp – Leu, the virus subtypes were either H3 or H5. With different values for the count of Asn – Met, various virus subtypes distinguished. All virus subclasses (except H6, H8, H10, H11 and H14) were classified by this model. Underneath each subtype common host has been depicted.

Similar articles

Cited by

References

    1. Driskell EA, Pickens JA, Humberd-Smith J, Gordy JT, Bradley KC, et al. (2012) Low Pathogenic Avian Influenza Isolates from Wild Birds Replicate and Transmit via Contact in Ferrets without Prior Adaptation. PLoS ONE 7: e38067. - PMC - PubMed
    1. Miotto O, Heiny AT, Albrecht R, García-Sastre A, Tan TW, et al. (2010) Complete-Proteome Mapping of Human Influenza A Adaptive Mutations: Implications for Human Transmissibility of Zoonotic Strains. PLoS ONE 5: e9025. - PMC - PubMed
    1. Gao R, Cao B, Hu Y, Feng Z, Wang D, et al. (2013) Human Infection with a Novel Avian-Origin Influenza A (H7N9) Virus. New England Journal of Medicine 368: 1888–1897. - PubMed
    1. Fouchier RAM, Kawaoka Y (2013) Avian flu: Gain-of-function experiments on H7N9. Nature 500: 150–151. - PubMed
    1. Watanabe T, Kiso M, Fukuyama S, Nakajima N, Imai M, et al. (2013) Characterization of H7N9 influenza A viruses isolated from humans. Nature 501: 551–555. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources