. 2021 Mar 4;11(1):5261.

doi: 10.1038/s41598-021-84637-4.

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Rishikesh Magar¹, Prakarsh Yadav², Amir Barati Farimani^{3

4

5}

Affiliations

¹ Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
² Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
³ Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.
⁴ Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.
⁵ Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.

PMID: 33664393
PMCID: PMC7970853
DOI: 10.1038/s41598-021-84637-4

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Rishikesh Magar et al. Sci Rep. 2021.

. 2021 Mar 4;11(1):5261.

doi: 10.1038/s41598-021-84637-4.

Authors

Rishikesh Magar¹, Prakarsh Yadav², Amir Barati Farimani^{3

4

5}

Affiliations

¹ Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
² Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
³ Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.
⁴ Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.
⁵ Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. barati@cmu.edu.

PMID: 33664393
PMCID: PMC7970853
DOI: 10.1038/s41598-021-84637-4

Abstract

The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of SARS-CoV-2 will save the life of thousands. To predict neutralizing antibodies for SARS-CoV-2 in a high-throughput manner, in this paper, we use different machine learning (ML) model to predict the possible inhibitory synthetic antibodies for SARS-CoV-2. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, like XGBoost, Random Forest, Multilayered Perceptron, Support Vector Machine and Logistic Regression, we screened thousands of hypothetical antibody sequences and found nine stable antibodies that potentially inhibit SARS-CoV-2. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Designing antibodies or peptide sequences that can inhibit the SARS-CoV-2 virus requires high throughput experimentation of vastly mutated sequences of potential inhibitors. The screening of thousands of available strains of antibodies are prohibitively expensive, and not feasible due to lack of available structures. However, machine learning models can enable the rapid and inexpensive exploration of vast sequence space on the computer in a fraction of seconds. We collected 1933 virus-antibody sequences with clinical patient IC₅₀ data. Graph featurization of antibody-antigen sequences creates a unique molecular representation. Using graph representation, we benchmarked and used a variety of shallow and deep learning models and selected XGBoost because of its superior performance and interpretability. We trained our model using a dataset including 1933 diverse virus epitope and the antibodies. To generate the hypothetical antibody library, we mutated the SARS scaffold antibody of 2006 (PDB:2GHW) and generated thousands of possible candidates. Using the ML model, we classified these sequences and selected the top 18 sequences that will neutralize SARS-CoV-2 with high confidence. We used MD simulations to check the stability of the 18 sequences and rank them based on their stability.

**Figure 2**
(a) t-Distributed Stochastic Neighbor Embedding (t-SNE) of all the viruses epitopes used in the training dataset, revealing biological similarity and diversity of the sequences used in the dataset. (b) t-SNE of all the therapeutics antibody sequences used in the training set for variety of different virus types. The majority of the broadly neutralizing antibodies such as 2F5 is clustered at the center of this plot. (c) Patient clinical IC50 data obtained from various sources and the distribution of the neutralizing (IC₅₀ < 10) and Non-neutralizing (IC₅₀ > 10) samples. (d) The number of samples for each virus class except HIV. For HIV, we collected 1883 samples. Influenza and Dengue has 10+ samples.

**Figure 3**
(a) The test accuracy with five-fold cross validation for XG-Boost, Random Forrest (RF), Logistic Regression (LR), Support Vector Machine (SVM) and Deep Learning (Multilayer Perceptron. XGBoost has the highest performance with (90.75%). (b) Out of training class test accuracy for influenza, Dengue, Ebola, Hepatitis, and SARS. To perform this test, for example for influenza, all the influenza virus-antibody sequences were removed from the training set and the obtained model were tested on all samples of Influenza and the accuracy is reported here. (c) Blosum validated mutations, non-neutralizing and neutralizing antibody sequences. To achieve more confidence, we set the threshold of prediction probability to 0.9895 in XGBoost and found 18 neutralizing antibody sequences (the green points). (d) Interpretability of ML model: to understand what mutations are playing the key roles in neutralization, XGBoost feature importance used with ranked atomic level features. Through connecting the atomic features with each of 20 amino acids, M was found to be the most important amino acids in neutralization followed by F, Y, W. The ML model predicted the presence of hydrophobicity and Sulfur as an important feature in antibody-antigen interaction. We concluded that M was the most important amino acid as it has both the characteristics of hydrophobicity and the presence of Sulfur.

**Figure 4**
(a) The snapshot of MD simulation of mutated proteins. Each protein is solvated in a box of water and simulated to collect the statistical data on the stability of mutants and co-mutants. (b) Mean Root Mean Square Deviation (RMSD) versus Mean contact distances for each candidate averaged over the whole trajectory.

See this image and copyright information in PMC

Cited by

Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development.
Keshavarzi Arshadi A, Webb J, Salem M, Cruz E, Calad-Thomson S, Ghadirian N, Collins J, Diez-Cecilia E, Kelly B, Goodarzi H, Yuan JS. Keshavarzi Arshadi A, et al. Front Artif Intell. 2020 Aug 18;3:65. doi: 10.3389/frai.2020.00065. eCollection 2020. Front Artif Intell. 2020. PMID: 33733182 Free PMC article. Review.
Application of artificial intelligence in COVID-19 medical area: a systematic review.
Chang Z, Zhan Z, Zhao Z, You Z, Liu Y, Yan Z, Fu Y, Liang W, Zhao L. Chang Z, et al. J Thorac Dis. 2021 Dec;13(12):7034-7053. doi: 10.21037/jtd-21-747. J Thorac Dis. 2021. PMID: 35070385 Free PMC article. Review.
NABP-BERT: NANOBODY®-antigen binding prediction based on bidirectional encoder representations from transformers (BERT) architecture.
Ahmed FS, Aly S, Liu X. Ahmed FS, et al. Brief Bioinform. 2024 Nov 22;26(1):bbae518. doi: 10.1093/bib/bbae518. Brief Bioinform. 2024. PMID: 39688476 Free PMC article.
Artificial intelligence-driven assessment of radiological images for COVID-19.
Bouchareb Y, Moradi Khaniabadi P, Al Kindi F, Al Dhuhli H, Shiri I, Zaidi H, Rahmim A. Bouchareb Y, et al. Comput Biol Med. 2021 Sep;136:104665. doi: 10.1016/j.compbiomed.2021.104665. Epub 2021 Jul 21. Comput Biol Med. 2021. PMID: 34343890 Free PMC article. Review.
Prediction of blood-brain barrier and Caco-2 permeability through the Enalos Cloud Platform: combining contrastive learning and atom-attention message passing neural networks.
Koutroumpa NM, Tsoumanis A, Sarimveis H, Lynch I, Melagraki G, Afantitis A. Koutroumpa NM, et al. J Cheminform. 2025 May 5;17(1):68. doi: 10.1186/s13321-025-01007-2. J Cheminform. 2025. PMID: 40325398 Free PMC article.

See all "Cited by" articles

References

1. Dörner T, Radbruch A. Antibodies and B cell memory in viral immunity. Immunity. 2007;27(3):384–392. doi: 10.1016/j.immuni.2007.09.002. - DOI - PubMed
1. Li Z, Yi Y, Luo X, Xiong N, Liu Y, Li S, Sun R, Wang Y, Hu B, Chen W, Zhang Y, Wang J, Huang B, Lin Y, Yang J, Cai W, Wang X, Cheng J, Chen Z, Sun K, Pan W, Zhan Z, Chen L, Ye F. Development and clinical application of a rapid IgM–IgG combined antibody test for SARS-CoV-2 infection diagnosis. J. Med. Virol. 2020 doi: 10.1002/jmv.25727. - DOI - PMC - PubMed
1. Hewitt EW. The MHC class I antigen presentation pathway: Strategies for viral immune evasion. Immunology. 2003;110(2):163–169. doi: 10.1046/j.1365-2567.2003.01738.x. - DOI - PMC - PubMed
1. Wu F, Zhao S, Yu B, Chen Y-M, Wang W, Song Z-G, Hu Y, Tao Z-W, Tian J-H, Pei Y-Y, Yuan M-L, Zhang Y-L, Dai F-H, Liu Y, Wang Q-M, Zheng J-J, Xu L, Holmes EC, Zhang Y-Z. A new coronavirus associated with human respiratory disease in China. Nature. 2020 doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
1. Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. COVID-19 outbreak prediction with machine learning. medRxiv. 2020 doi: 10.1101/2020.04.17.20070094. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

47247.1.5007162/Center of Machine Learning in Health

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Affiliations

Potential neutralizing antibodies discovered for novel corona virus using machine learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous