Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 25;14(3):469.
doi: 10.3390/v14030469.

Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning

Affiliations

Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning

He Wang et al. Viruses. .

Abstract

To date, many experiments have revealed that the functional balance between hemagglutinin (HA) and neuraminidase (NA) plays a crucial role in viral mobility, production, and transmission. However, whether and how HA and NA maintain balance at the sequence level needs further investigation. Here, we applied principal component analysis and hierarchical clustering analysis on thousands of HA and NA sequences of A/H1N1 and A/H3N2. We discovered significant coevolution between HA and NA at the sequence level, which is closely related to the type of host species and virus epidemic years. Furthermore, we propose a sequence-to-sequence transformer model (S2STM), which mainly consists of an encoder and a decoder that adopts a multi-head attention mechanism for establishing the mapping relationship between HA and NA sequences. The training results reveal that the S2STM can effectively realize the "translation" from HA to NA or vice versa, thereby building a relationship network between them. Our work combines unsupervised and supervised machine learning methods to identify the sequence matching between HA and NA, which will advance our understanding of IAVs' evolution and also provide a novel idea for sequence analysis methods.

Keywords: hemagglutinin; influenza A viruses; machine learning; neuraminidase; sequence analysis; viral evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Process of sequence-to-sequence transformer model (S2STM). The hemagglutinin (HA) and neuraminidase (NA) sequences are discretized into “sentences” composed of triplets and then numbered, which are used as indices into an embedding. A start sign “” and an end sign “” are added at both ends of the sentence. S2STM is mainly composed of an encoder, a decoder, and a final linear layer, where a multi-head attention mechanism is applied.
Figure 2
Figure 2
Hierarchical clustering analysis results. (a) A/H1N1-HA; (b) A/H1N1-NA; (c) A/H3N2-HA; (d) A/H3N2-NA. The number of virus strains in each initial cluster is indicated in brackets. Clusters are merged and reordered manually (as indicated at the bottom).
Figure 3
Figure 3
Hierarchical clustering analysis based on principal component analysis (PCA) matrices with dimensionality reduction. (a) A/H1N1-HA; (b) A/H1N1-NA; (c) A/H3N2-HA; (d) A/H3N2-NA. Each subset is divided into 14 clusters (indicated in Figure 2), which are grouped into different evolution branches (i, ii, iii, and iv) starting from the “Avian” cluster. The X-axis, Y-axis, and Z-axis represent the projections to the first three principal components (PCs): PC1, PC2, and PC3, respectively.
Figure 4
Figure 4
Cluster matching between HA and NA clusters. (a) Correlation between HA and NA clusters in A/H1N1. The color of the circle represents the type of host species and the size represents the number of contained strains. The more points on the diagonal, the stronger the linear correlation is; this is measured using Pearson coefficient with R=0.991 in A/H1N1. (b) The violin plot shows the time evolution of each HA cluster of A/H1N1. (c) Time evolution of NA clusters of A/H1N1. (d) Correlation between HA and NA clusters in A/H3N2 with R=0.986 . (e) Time evolution of HA clusters of A/H3N2. (f) Time evolution of NA clusters of A/H3N2. The HA-cluster number and NA-cluster number are indicated in Figure 2. The abbreviations “A” (“Avian”), “C” (“Canine”), “S” (“Swine”), and “H” (“Human”) represent the dominant host species in each cluster. “H/S” (“Human/Swine”) indicates that both host species account for a large proportion.
Figure 5
Figure 5
Distribution of translation accuracies using selected strains in the testing dataset. (a) A/H1N1, from HA to NA (“HA-to-NA”); (b) A/H1N1, from NA to HA (“NA-to-HA”); (c) A/H3N2, from HA to NA; (d) A/H3N2, from NA to HA. The X-axis represents the translation accuracies; Y-axis respectively indicates the proportion of accuracies, HA-cluster number, and NA-cluster number, from top to bottom. The proportions of strains with translation accuracies greater than 0.95 (or less than 0.95) are indicated in the first line of images. The accuracy distribution of each HA and NA cluster is counted in the second and the third lines of images: the darker the color, the more strains there will be.

Similar articles

Cited by

References

    1. Wrapp D., Wang N.S., Corbett K.S., Goldsmith J.A., Hsieh C.L., Abiona O., Graham B.S., McLellan J.S. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367:1260–1263. doi: 10.1126/science.abb2507. - DOI - PMC - PubMed
    1. Chen R.B., Holmes E.C. Avian influenza virus exhibits rapid evolutionary dynamics. Mol. Biol. Evol. 2006;23:2336–2341. doi: 10.1093/molbev/msl102. - DOI - PubMed
    1. Chen J., Lee K.H., Steinhauer D.A., Stevens D.J., Skehel J.J., Wiley D.C. Structure of the hemagglutinin precursor cleavage site, a determinant of influenza pathogenicity and the origin of the labile conformation. Cell. 1998;95:409–417. doi: 10.1016/S0092-8674(00)81771-7. - DOI - PubMed
    1. Xu X.J., Zhu X.Y., Dwek R.A., Stevens J., Wilson I.A. Structural Characterization of the 1918 Influenza Virus H1N1 Neuraminidase. J. Virol. 2008;82:10493–10501. doi: 10.1128/JVI.00959-08. - DOI - PMC - PubMed
    1. Gaymard A., Le Briand N., Frobert E., Lina B., Escuret V. Functional balance between neuraminidase and haemagglutinin in influenza viruses. Clin. Microbiol. Infec. 2016;22:975–983. doi: 10.1016/j.cmi.2016.07.007. - DOI - PubMed

Publication types

Substances