Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;18(12):e70044.
doi: 10.1111/irv.70044.

Examining the Influenza A Virus Sialic Acid Binding Preference Predictions of a Sequence-Based Convolutional Neural Network

Affiliations

Examining the Influenza A Virus Sialic Acid Binding Preference Predictions of a Sequence-Based Convolutional Neural Network

Laura K Borkenhagen et al. Influenza Other Respir Viruses. 2024 Dec.

Abstract

Background: Though receptor binding specificity is well established as a contributor to host tropism and spillover potential of influenza A viruses, determining receptor binding preference of a specific virus still requires expensive and time-consuming laboratory analyses. In this study, we pilot a machine learning approach for prediction of binding preference.

Methods: We trained a convolutional neural network to predict the α2,6-linked sialic acid preference of influenza A viruses given the hemagglutinin amino acid sequence. The model was evaluated with an independent test dataset to assess the standard performance metrics, the impact of missing data in the test sequences, and the prediction performance on novel subtypes. Further, features found to be important to the generation of predictions were tested via targeted mutagenesis of H9 and H16 proteins expressed on pseudoviruses.

Results: The final model developed in this study produced predictions on a test dataset correctly 94% of the time and an area under the receiver operating characteristic curve of 0.93. The model tolerated about 10% missing test data without compromising accurate prediction performance. Predictions on novel subtypes revealed that the model can extrapolate feature relationships between subtypes when generating binding predictions. Finally, evaluation of the features important for model predictions helped identify positions that alter the sialic acid conformation preference of hemagglutinin proteins in practice.

Conclusions: Ultimately, our results provide support to this in silico approach to hemagglutinin receptor binding preference prediction. This work emphasizes the need for ongoing research efforts to produce tools that may aid future pandemic risk assessment.

Keywords: hemagglutinin; influenza; machine learning; receptor binding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Simplified convolutional neural network architecture.
FIGURE 2
FIGURE 2
Probabilities predicted from an independent test of a convolutional neural network trained to classify hemagglutinin sequences by α2,6‐linked sialic acid binding preference.
FIGURE 3
FIGURE 3
Saliency map of SHAP values rendered on a hemagglutinin protein. The values rendered are the absolute mean of SHAP values of a convolutional neural network trained to differentiate hemagglutinin sequences by binding preference to α2,6‐linked sialic acid receptors. The absolute means were taken for each amino acid position generated on an independent test dataset. The dotted line box indicates the region of a zoomed in view of the binding pocket (on the right). Important structures and residues are annotated in black. The depictions used PDB 6TZB and were generated in ChimeraX [15].
FIGURE 4
FIGURE 4
Convolutional neural network test predictions on sequences with missing data. A length of missing data was coded randomly into each test data sequence prior to model prediction. Bootstraps (100) were performed for each length of missing amino acids. The mean and standard deviation of the area under the receiver operating characteristic curve (AUC) and accuracy were recorded.
FIGURE 5
FIGURE 5
Convolutional neural network predictions on sequences with missing data by position. A length of missing data was coded into each possible start position of the test data prior to model prediction. The area under the receiver operating characteristic curve (AUC) and accuracy were recorded. A simple illustration of hemagglutinin is below the plot to give context for the missing data start position. Chains A and B are separated by the protease cleavage site. The receptor binding site (RBS) includes the 130‐Loop, 150‐Loop, 190‐Helix, and 220‐Loop.
FIGURE 6
FIGURE 6
Probabilities predicted from an independent test of a convolutional neural network trained without samples of subtype H16 to classify hemagglutinin sequences by α2,6‐linked sialic acid binding preference. Arrows indicate sample predictions that changed class after omission of H16 in the training data. The test metrics do not include predictions on H17, H18, or H19.

Similar articles

References

    1. de Graaf M. and Fouchier R. A. M., “Role of Receptor Binding Specificity in Influenza A Virus Transmission and Pathogenesis,” EMBO Journal 33, no. 8 (2014): 823–841, 10.1002/embj.201387442. - DOI - PMC - PubMed
    1. Suttie A., Deng Y.‐M., Greenhill A. R., Dussart P., Horwood P. F., and Karlsson E. A., “Inventory of Molecular Markers Affecting Biological Characteristics of Avian Influenza A Viruses,” Virus Genes 55, no. 6 (2019): 739–768, 10.1007/s11262-019-01700-z. - DOI - PMC - PubMed
    1. Raman R., Venkataraman M., Ramakrishnan S., Lang W., Raguram S., and Sasisekharan R., “Advancing Glycomics: Implementation Strategies at the Consortium for Functional Glycomics,” Glycobiology 16, no. 5 (2006): 82R–90R. - PubMed
    1. Mögling R., Richard M. J., van der Vliet S., et al., “Neuraminidase‐Mediated Haemagglutination of Recent Human Influenza A (H3N2) Viruses Is Determined by Arginine 150 Flanking the Neuraminidase Catalytic Site,” Journal of General Virology 98, no. 6 (2017): 1274–1281. - PMC - PubMed
    1. Zhang Y., Aevermann B. D., Anderson T. K., et al., “Influenza Research Database: An Integrated Bioinformatics Resource for Influenza Virus Research,” Nucleic Acids Research 45, no. D1 (2017): D466–D474, 10.1093/nar/gkw857. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources