Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Dec 21;25(24):13674.
doi: 10.3390/ijms252413674.

Empirical Comparison and Analysis of Artificial Intelligence-Based Methods for Identifying Phosphorylation Sites of SARS-CoV-2 Infection

Affiliations
Review

Empirical Comparison and Analysis of Artificial Intelligence-Based Methods for Identifying Phosphorylation Sites of SARS-CoV-2 Infection

Hongyan Lai et al. Int J Mol Sci. .

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a member of the large coronavirus family with high infectivity and pathogenicity and is the primary pathogen causing the global pandemic of coronavirus disease 2019 (COVID-19). Phosphorylation is a major type of protein post-translational modification that plays an essential role in the process of SARS-CoV-2-host interactions. The precise identification of phosphorylation sites in host cells infected with SARS-CoV-2 will be of great importance to investigate potential antiviral responses and mechanisms and exploit novel targets for therapeutic development. Numerous computational tools have been developed on the basis of phosphoproteomic data generated by mass spectrometry-based experimental techniques, with which phosphorylation sites can be accurately ascertained across the whole SARS-CoV-2-infected proteomes. In this work, we have comprehensively reviewed several major aspects of the construction strategies and availability of these predictors, including benchmark dataset preparation, feature extraction and refinement methods, machine learning algorithms and deep learning architectures, model evaluation approaches and metrics, and publicly available web servers and packages. We have highlighted and compared the prediction performance of each tool on the independent serine/threonine (S/T) and tyrosine (Y) phosphorylation datasets and discussed the overall limitations of current existing predictors. In summary, this review would provide pertinent insights into the exploitation of new powerful phosphorylation site identification tools, facilitate the localization of more suitable target molecules for experimental verification, and contribute to the development of antiviral therapies.

Keywords: SARS-CoV-2; computation tool; deep learning; machine learning; phosphorylation site.

PubMed Disclaimer

Conflict of interest statement

The authors declare no potential conflicts of interest.

Figures

Figure 1
Figure 1
The workflow of existing computational tools for predicting phosphorylation sites in host cells infected with SARS-CoV-2. These tools are developed based on conventional machine learning models and end-to-end deep learning networks, mainly through the following steps: benchmark sequence data preparation, feature encoding and selection, classification model design, and prediction assessment.
Figure 2
Figure 2
The well-studied benchmark datasets of phosphorylation sites in host cells infected with SARS-CoV-2. (A) The originations, as well as the common collection and preprocessing procedures of benchmark datasets. (B) The detailed information of the three benchmark datasets, named A549 (human), Vero E6 (African green monkey), and combined.

Similar articles

References

    1. Hu B., Guo H., Zhou P., Shi Z.-L. Characteristics of SARS-CoV-2 and COVID-19. Nat. Rev. Microbiol. 2020;19:141. doi: 10.1038/s41579-020-00459-7. - DOI - PMC - PubMed
    1. Chen N., Zhou M., Dong X., Qu J., Gong F., Han Y., Qiu Y., Wang J., Liu Y., Wei Y., et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study. Lancet. 2020;395:507–513. doi: 10.1016/S0140-6736(20)30211-7. - DOI - PMC - PubMed
    1. Ren L., Xu Y., Ning L., Pan X., Li Y., Zhao Q., Pang B., Huang J., Deng K., Zhang Y. TCM2COVID: A resource of anti-COVID-19 traditional Chinese medicine with effects and mechanisms. iMETA. 2022;1:e42. doi: 10.1002/imt2.42. - DOI - PMC - PubMed
    1. Xu B., Liu D., Wang Z., Tian R., Zuo Y. Multi-substrate selectivity based on key loops and non-homologous domains: New insight into ALKBH family. Cell. Mol. Life Sci. 2021;78:129–141. doi: 10.1007/s00018-020-03594-9. - DOI - PMC - PubMed
    1. Stukalov A., Girault V., Grass V., Karayel O., Bergant V., Urban C., Haas D.A., Huang Y., Oubraham L., Wang A., et al. Multilevel proteomics reveals host perturbations by SARS-CoV-2 and SARS-CoV. Nature. 2021;594:246–252. doi: 10.1038/s41586-021-03493-4. - DOI - PubMed

LinkOut - more resources