Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 13;14(1):3478.
doi: 10.1038/s41467-023-39199-6.

Predicting the antigenic evolution of SARS-COV-2 with deep learning

Affiliations

Predicting the antigenic evolution of SARS-COV-2 with deep learning

Wenkai Han et al. Nat Commun. .

Abstract

The relentless evolution of SARS-CoV-2 poses a significant threat to public health, as it adapts to immune pressure from vaccines and natural infections. Gaining insights into potential antigenic changes is critical but challenging due to the vast sequence space. Here, we introduce the Machine Learning-guided Antigenic Evolution Prediction (MLAEP), which combines structure modeling, multi-task learning, and genetic algorithms to predict the viral fitness landscape and explore antigenic evolution via in silico directed evolution. By analyzing existing SARS-CoV-2 variants, MLAEP accurately infers variant order along antigenic evolutionary trajectories, correlating with corresponding sampling time. Our approach identified novel mutations in immunocompromised COVID-19 patients and emerging variants like XBB1.5. Additionally, MLAEP predictions were validated through in vitro neutralizing antibody binding assays, demonstrating that the predicted variants exhibited enhanced immune evasion. By profiling existing variants and predicting potential antigenic changes, MLAEP aids in vaccine development and enhances preparedness against future SARS-CoV-2 variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the MLAEP framework.
a The multi-task learning model. We collected and cleaned the RBD variant sequences and their corresponding binding specificity to the ACE2 and eight antibodies. Then, the sequences and the structures of their binding partners were fed into the deep learning model with the multi-task learning objective. b The genetic algorithm. In silico-directed evolution was performed to navigate the virtual fitness landscape defined by the nine scores from the multi-task model. The generation loop was repeated multiple times until the desired functionality was reached. c These generated sequences were then subjected to validation experiments for evaluating their functional attributes.
Fig. 2
Fig. 2. Performance evaluation and in vitro pVNT experimental data validation.
a Model performance comparison for the classification of ACE2 and antibody binding specificity across different algorithms. Including our model, augmented Potts model, eUniRep model, gUniRep model, CNN, RNN, LSTM, linear regression, SVM, and random forest. The details of model implementation are given in Methods and performance metrics were calculated according to the equations provided in the Methods. b Validation of the predicted immune escape potential using the class 4 monoclonal antibody-based pVNT assay data (Antibody 10–40). The x axis indicates the model predicted variant escape potential, while the y axis is the log fold change of the VOCs compared with the wild type.
Fig. 3
Fig. 3. Multi-task model captures the antigenic evolutionary potential.
a The landscape of SARS-COV-2 RBD variant sequences (obtained from GISAID), represented as a KNN-similarity graph (with the darker blue region represents less recent date, e.g., 2019, and yellow represents more recent date, e.g., 2022). The gray lines indicate graph edges, while the colored points are sequences with the known sampling time. The streamlines among the points show a visual correlation between model-predicted scores and the known sampling time. b Use the average score of our model to visualize the landscape. The landscape is colored by the model prediction score with darker colors represent lower scores and lighter colors represent higher scores. c Spearman correlation overtime for the model predictions, including the ACE2-binding score, immune escape potential, and the weighted average of the two in a time window of previous three months for each sampled date. (From February 2020 to February 2022) d Principal component analyses of the sequence’s representations from our model, colored by the escaping/binding ability towards COV2-2832, COV2-2165(class 1 antibody), COV2-2479, COV2-2500 (class 2 antibody), COV2-2096, COV2-2499 (class 3 antibody), COV2-2677, COV2-2094 (class 4 antibody) and ACE2.
Fig. 4
Fig. 4. Overview of the synthetic sequences.
a Distance-preserving multidimensional scaling plot illustrates synthetic sequences’ diversity compared to existing variants and deep mutagenesis sequences. A scale bar of three mutations is shown. b, c the differences between the initial sequences and the synthetic sequences. b The surface of the RBD protein, colored by the KL divergence between the initial sequences and the synthetic sequences. Colored outlines indicate the epitope structural footprint. c The top 50 sites with the highest KL divergence value are selected for visualizing the difference between the generated sequences and the existing sequences. Enriched amino acids are located on the positive side of the y axis and depleted amino acids are located on the negative side.
Fig. 5
Fig. 5. Epitope mutations confer RBD resistance to the binding of neutralizing mAbs.
HTRF-based binding assay of wild-type and mutant RBD proteins against two representative anti-RBD monoclonal antibodies from four classes, including COV2-2832 and COV2-2165 (class 1 antibody), COV2-2479 and COV2-2050 (class 2 antibody), COV2-2096 and COV2-2499 (class 3 antibody), as well as COV2-2094 and COV2-2677 (class 4 antibody). ∆F% values were calculated from raw data and fit into dose–response curves, and the IC50 values were listed side by side. Data are presented as mean values ± standard deviation (n = 3 independent experiments). Source data are provided as a Source Data file.

References

    1. Wang P, et al. Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature. 2021;593:130–135. doi: 10.1038/s41586-021-03398-2. - DOI - PubMed
    1. Wang P, et al. Increased resistance of SARS-CoV-2 variant P.1 to antibody neutralization. Cell Host Microbe. 2021;29:747–751.e744. doi: 10.1016/j.chom.2021.04.007. - DOI - PMC - PubMed
    1. McCormick KD, Jacobs JL, Mellors JW. The emerging plasticity of SARS-CoV-2. Science. 2021;371:1306–1308. doi: 10.1126/science.abg4493. - DOI - PubMed
    1. Iketani S, et al. Antibody evasion properties of SARS-CoV-2 Omicron sublineages. Nature. 2022;604:553–556. doi: 10.1038/s41586-022-04594-4. - DOI - PMC - PubMed
    1. Starr TN, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell. 2020;182:1295–1310.e1220. doi: 10.1016/j.cell.2020.08.012. - DOI - PMC - PubMed

Publication types

Substances

Supplementary concepts