Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 4;11(1):5261.
doi: 10.1038/s41598-021-84637-4.

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Affiliations

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Rishikesh Magar et al. Sci Rep. .

Abstract

The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of SARS-CoV-2 will save the life of thousands. To predict neutralizing antibodies for SARS-CoV-2 in a high-throughput manner, in this paper, we use different machine learning (ML) model to predict the possible inhibitory synthetic antibodies for SARS-CoV-2. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, like XGBoost, Random Forest, Multilayered Perceptron, Support Vector Machine and Logistic Regression, we screened thousands of hypothetical antibody sequences and found nine stable antibodies that potentially inhibit SARS-CoV-2. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Designing antibodies or peptide sequences that can inhibit the SARS-CoV-2 virus requires high throughput experimentation of vastly mutated sequences of potential inhibitors. The screening of thousands of available strains of antibodies are prohibitively expensive, and not feasible due to lack of available structures. However, machine learning models can enable the rapid and inexpensive exploration of vast sequence space on the computer in a fraction of seconds. We collected 1933 virus-antibody sequences with clinical patient IC50 data. Graph featurization of antibody-antigen sequences creates a unique molecular representation. Using graph representation, we benchmarked and used a variety of shallow and deep learning models and selected XGBoost because of its superior performance and interpretability. We trained our model using a dataset including 1933 diverse virus epitope and the antibodies. To generate the hypothetical antibody library, we mutated the SARS scaffold antibody of 2006 (PDB:2GHW) and generated thousands of possible candidates. Using the ML model, we classified these sequences and selected the top 18 sequences that will neutralize SARS-CoV-2 with high confidence. We used MD simulations to check the stability of the 18 sequences and rank them based on their stability.
Figure 2
Figure 2
(a) t-Distributed Stochastic Neighbor Embedding (t-SNE) of all the viruses epitopes used in the training dataset, revealing biological similarity and diversity of the sequences used in the dataset. (b) t-SNE of all the therapeutics antibody sequences used in the training set for variety of different virus types. The majority of the broadly neutralizing antibodies such as 2F5 is clustered at the center of this plot. (c) Patient clinical IC50 data obtained from various sources and the distribution of the neutralizing (IC50 < 10) and Non-neutralizing (IC50 > 10) samples. (d) The number of samples for each virus class except HIV. For HIV, we collected 1883 samples. Influenza and Dengue has 10+ samples.
Figure 3
Figure 3
(a) The test accuracy with five-fold cross validation for XG-Boost, Random Forrest (RF), Logistic Regression (LR), Support Vector Machine (SVM) and Deep Learning (Multilayer Perceptron. XGBoost has the highest performance with (90.75%). (b) Out of training class test accuracy for influenza, Dengue, Ebola, Hepatitis, and SARS. To perform this test, for example for influenza, all the influenza virus-antibody sequences were removed from the training set and the obtained model were tested on all samples of Influenza and the accuracy is reported here. (c) Blosum validated mutations, non-neutralizing and neutralizing antibody sequences. To achieve more confidence, we set the threshold of prediction probability to 0.9895 in XGBoost and found 18 neutralizing antibody sequences (the green points). (d) Interpretability of ML model: to understand what mutations are playing the key roles in neutralization, XGBoost feature importance used with ranked atomic level features. Through connecting the atomic features with each of 20 amino acids, M was found to be the most important amino acids in neutralization followed by F, Y, W. The ML model predicted the presence of hydrophobicity and Sulfur as an important feature in antibody-antigen interaction. We concluded that M was the most important amino acid as it has both the characteristics of hydrophobicity and the presence of Sulfur.
Figure 4
Figure 4
(a) The snapshot of MD simulation of mutated proteins. Each protein is solvated in a box of water and simulated to collect the statistical data on the stability of mutants and co-mutants. (b) Mean Root Mean Square Deviation (RMSD) versus Mean contact distances for each candidate averaged over the whole trajectory.

Similar articles

Cited by

References

    1. Dörner T, Radbruch A. Antibodies and B cell memory in viral immunity. Immunity. 2007;27(3):384–392. doi: 10.1016/j.immuni.2007.09.002. - DOI - PubMed
    1. Li Z, Yi Y, Luo X, Xiong N, Liu Y, Li S, Sun R, Wang Y, Hu B, Chen W, Zhang Y, Wang J, Huang B, Lin Y, Yang J, Cai W, Wang X, Cheng J, Chen Z, Sun K, Pan W, Zhan Z, Chen L, Ye F. Development and clinical application of a rapid IgM–IgG combined antibody test for SARS-CoV-2 infection diagnosis. J. Med. Virol. 2020 doi: 10.1002/jmv.25727. - DOI - PMC - PubMed
    1. Hewitt EW. The MHC class I antigen presentation pathway: Strategies for viral immune evasion. Immunology. 2003;110(2):163–169. doi: 10.1046/j.1365-2567.2003.01738.x. - DOI - PMC - PubMed
    1. Wu F, Zhao S, Yu B, Chen Y-M, Wang W, Song Z-G, Hu Y, Tao Z-W, Tian J-H, Pei Y-Y, Yuan M-L, Zhang Y-L, Dai F-H, Liu Y, Wang Q-M, Zheng J-J, Xu L, Holmes EC, Zhang Y-Z. A new coronavirus associated with human respiratory disease in China. Nature. 2020 doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
    1. Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. COVID-19 outbreak prediction with machine learning. medRxiv. 2020 doi: 10.1101/2020.04.17.20070094. - DOI

Publication types

MeSH terms

Substances