Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning
- PMID: 35336876
- PMCID: PMC8950662
- DOI: 10.3390/v14030469
Sequence Matching between Hemagglutinin and Neuraminidase through Sequence Analysis Using Machine Learning
Abstract
To date, many experiments have revealed that the functional balance between hemagglutinin (HA) and neuraminidase (NA) plays a crucial role in viral mobility, production, and transmission. However, whether and how HA and NA maintain balance at the sequence level needs further investigation. Here, we applied principal component analysis and hierarchical clustering analysis on thousands of HA and NA sequences of A/H1N1 and A/H3N2. We discovered significant coevolution between HA and NA at the sequence level, which is closely related to the type of host species and virus epidemic years. Furthermore, we propose a sequence-to-sequence transformer model (S2STM), which mainly consists of an encoder and a decoder that adopts a multi-head attention mechanism for establishing the mapping relationship between HA and NA sequences. The training results reveal that the S2STM can effectively realize the "translation" from HA to NA or vice versa, thereby building a relationship network between them. Our work combines unsupervised and supervised machine learning methods to identify the sequence matching between HA and NA, which will advance our understanding of IAVs' evolution and also provide a novel idea for sequence analysis methods.
Keywords: hemagglutinin; influenza A viruses; machine learning; neuraminidase; sequence analysis; viral evolution.
Conflict of interest statement
The authors declare no conflict of interest.
Figures





Similar articles
-
Epidemiological and genetic characterization of pH1N1 and H3N2 influenza viruses circulated in MENA region during 2009-2017.BMC Infect Dis. 2019 Apr 11;19(1):314. doi: 10.1186/s12879-019-3930-6. BMC Infect Dis. 2019. PMID: 30971204 Free PMC article.
-
Use of hemagglutinin and neuraminidase amplicon-based high-throughput sequencing with variant analysis to detect co-infection and resolve identical consensus sequences of seasonal influenza in a university setting.BMC Infect Dis. 2021 Aug 13;21(1):810. doi: 10.1186/s12879-021-06526-5. BMC Infect Dis. 2021. PMID: 34388979 Free PMC article.
-
Computational study of interdependence between hemagglutinin and neuraminidase of pandemic 2009 H1N1.IEEE Trans Nanobioscience. 2015 Mar;14(2):157-66. doi: 10.1109/TNB.2015.2406992. Epub 2015 Mar 2. IEEE Trans Nanobioscience. 2015. PMID: 25751873 Free PMC article.
-
Functional balance between neuraminidase and haemagglutinin in influenza viruses.Clin Microbiol Infect. 2016 Dec;22(12):975-983. doi: 10.1016/j.cmi.2016.07.007. Epub 2016 Jul 15. Clin Microbiol Infect. 2016. PMID: 27424943 Review.
-
Influenza as a molecular walker.Chem Sci. 2019 Nov 14;11(1):27-36. doi: 10.1039/c9sc05149j. eCollection 2020 Jan 7. Chem Sci. 2019. PMID: 32153750 Free PMC article. Review.
Cited by
-
Accurately identifying hemagglutinin using sequence information and machine learning methods.Front Med (Lausanne). 2023 Oct 31;10:1281880. doi: 10.3389/fmed.2023.1281880. eCollection 2023. Front Med (Lausanne). 2023. PMID: 38020152 Free PMC article.
-
Co-Mutations and Possible Variation Tendency of the Spike RBD and Membrane Protein in SARS-CoV-2 by Machine Learning.Int J Mol Sci. 2024 Apr 25;25(9):4662. doi: 10.3390/ijms25094662. Int J Mol Sci. 2024. PMID: 38731879 Free PMC article.