. 2010 Feb 8:11:79.

doi: 10.1186/1471-2105-11-79.

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

Maria Pamela C David¹, Gisela P Concepcion, Eduardo A Padlan

Affiliations

Affiliation

¹ Virtual Laboratory of Biomolecular Structures, Marine Science Institute, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines. maria.pamela.david@gmail.com

PMID: 20144194
PMCID: PMC3098112
DOI: 10.1186/1471-2105-11-79

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

Maria Pamela C David et al. BMC Bioinformatics. 2010.

. 2010 Feb 8:11:79.

doi: 10.1186/1471-2105-11-79.

Authors

Maria Pamela C David¹, Gisela P Concepcion, Eduardo A Padlan

Affiliation

¹ Virtual Laboratory of Biomolecular Structures, Marine Science Institute, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines. maria.pamela.david@gmail.com

PMID: 20144194
PMCID: PMC3098112
DOI: 10.1186/1471-2105-11-79

Abstract

Background: All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences.

Results: The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%.

Conclusions: This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.

PubMed Disclaimer

Figures

**Figure 1**
**Normalized mutation matrices of amyloidogenic (Column A) and non-amyloidogenic derivatives (Column B) of 12 antibody germlines**. Original residues are in rows and corresponding replacement residues are in columns. The amino acids have been arranged according to increasing β-sheet forming propensities [54]. The intensity matrix of the difference between the amyloidogenic and non-amyloidogenic matrices (Column C) reflects the relative predominance of a mutation type in either amyloid or non-amyloid formers. A fourth matrix set (Column D) is used to indicate the mutations that occur exclusively in amyloidogenic derivatives. Separate matrices were generated for mutations in buried CDR, exposed CDR, buried FR and exposed FR positions.

**Figure 2**
**Analysis of mutations exclusive to amyloidogenic derivatives**. A rough analysis of mutation patterns could be made by dividing the matrix using the diagonal, or by dividing it into quadrants. Mutations to the right of the diagonal are characterized by increased sheet-forming propensities (+), while those to the left imply the opposite (-). In terms of the quadrants, which are numbered in the same way as the Cartesian plane, the first contains information on mutations from low- to mid-propensity, sheet-associated amino acids to relatively high-propensity sheet-associated amino acids (++), while the third quadrant contains the opposite (--). In the most general sense, mutations either on the right of the diagonal, or in the first and third quadrants (shaded), would be the biggest contributors to destabilization. The analysis indicates that a significant number of mutations in the exposed CDR residues result in increased β-sheet-forming propensities, while mutations in buried FR residues tend to be associated with a decrease in β-sheet-forming propensities.

**Figure 3**
**Decision tree for the evaluation of individual mutations**. A decision tree (A) was constructed in order to evaluate the contribution of a mutation to amyloidogenicity. A *path* is followed for each mutation, depending on its position and exposure, as well as on the increase or decrease in sheet-forming propensity associated with it. Each path leads to one of eight terminal nodes, which is associated with a *score*, defined as the product of the weights (in italics) along the path leading to it. An analysis of paths taken by amyloidogenic and non-amyloidogenic derivatives of the different germlines indicated that different pairs of terminal nodes may be used to provide maximum separation between these derivatives. For instance, amyloidogenic derivatives of X93627 mostly end in leaf 1, while the non-amyloidogenic counterparts are more frequently associated with leaf 7; germline derivatives that can be distinguished using specific terminal nodes are indicated in the illustration. Based on this analysis, a final tree (B) was created which branches first on the basis of the germline to which the derivative being tested belongs; the structure and weights of the original tree (A) are kept. Each edge emanating from a germline node is connected to a copy of the original tree, where weights on paths which could be used for maximizing the separation between amyloidogenic and non-amyloidogenic derivatives are either boosted or decreased tenfold. For the illustrative example in (B), paths for J00248 (Germline 1) and Z22208 (Germline n) are shown.

**Figure 4**
**Application of the naive Bayesian method for the prediction of amyloidosis**. Given a set of amyloidogenic and non-amyloidogenic derivatives of a single germline, it is possible to generate the probability that a mutation at a particular position would cause amyloidosis or not. Briefly, separate mutation propensities for amyloid (p_AM) and non-amyloid (p_NAM) formers are generated by counting the frequency of mutations per position. These fractions, as well as complements thereof (i.e. the probability that there will be no mutation in either an amyloid-former or non-amyloid-former at a particular position, in black) are subsequently used to compute the amyloidogenic and non-amyloidogenic probabilities of a test sequence. To calculate for the amyloidogenic probability of a test sequence, a probability is assigned to each of the n positions in the sequence based on the characteristic of that position (i.e. if it contains a mutation or not). For positions containing no mutations this probability is equivalent to q_AM, q_AM= 1 - p_AMfor position x. The probability for positions with mutations is equal to p_AM. Non-amyloidogenic probabilities are calculated in a similar manner, but with the use of p_NAMinstead of p_AM. To avoid multiplications by zero, the Laplace correction is used. A product of the probabilities is subsequently taken; if the product of amylodogenic probabilities is higher, the test sequence is classified as amyloidogenic.

**Figure 5**
**Steps in generating and testing a weighted decision tree**. To create a weighted decision tree, mutations from amyloidogenic and non-amyloidogenic derivatives of a single germline are organized into separate matrices that factor in location, exposure and sheet-forming propensity into account (Step 1). These matrices are visualized and analyzed for general trends that may be transformed into weights (Step 2). An initial tree is constructed from these information, which is tested against the training set (Step 3). From this testing, it became evident that certain paths can be used for maximally separating amyloidogenic and non-amyloidogenic derivatives of a germline, and that these paths are germline-dependent. We then generated a tree that takes the germline of origin into account, and which has different boosted paths. The final step was to generate the classification threshold, which was determined from the analysis of scores for the test set (Step 4). This tree was then used to classify sequences in an independent, holdout test set (Step 5).

See this image and copyright information in PMC

Cited by

AB-Amy: machine learning aided amyloidogenic risk prediction of therapeutic antibody light chains.
Zhou Y, Huang Z, Gou Y, Liu S, Yang W, Zhang H, Dzisoo AM, Huang J. Zhou Y, et al. Antib Ther. 2023 Apr 12;6(3):147-156. doi: 10.1093/abt/tbad007. eCollection 2023 Jul. Antib Ther. 2023. PMID: 37492587 Free PMC article.
Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.
Stanislawski J, Kotulska M, Unold O. Stanislawski J, et al. BMC Bioinformatics. 2013 Jan 17;14:21. doi: 10.1186/1471-2105-14-21. BMC Bioinformatics. 2013. PMID: 23327628 Free PMC article.
MetAmyl: a METa-predictor for AMYLoid proteins.
Emily M, Talvas A, Delamarche C. Emily M, et al. PLoS One. 2013 Nov 19;8(11):e79722. doi: 10.1371/journal.pone.0079722. eCollection 2013. PLoS One. 2013. PMID: 24260292 Free PMC article.
Computer-aided antibody design.
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Kuroda D, et al. Protein Eng Des Sel. 2012 Oct;25(10):507-21. doi: 10.1093/protein/gzs024. Epub 2012 Jun 2. Protein Eng Des Sel. 2012. PMID: 22661385 Free PMC article. Review.
Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers.
Malueka RG, Takaoka Y, Yagi M, Awano H, Lee T, Dwianingsih EK, Nishida A, Takeshima Y, Matsuo M. Malueka RG, et al. BMC Genet. 2012 Mar 31;13:23. doi: 10.1186/1471-2156-13-23. BMC Genet. 2012. PMID: 22462762 Free PMC article.

See all "Cited by" articles

References

1. Presta L. Antibody engineering. Curr Opin Biotechnol. 1992;3:394–398. doi: 10.1016/0958-1669(92)90168-I. - DOI - PubMed
1. Presta L. Antibody engineering for therapeutics. Current Opinion in Structural Biology. 2003;13(4):519–525. doi: 10.1016/S0959-440X(03)00103-9. - DOI - PubMed
1. Padlan E. A possible procedure for reducing the immunogenicity of antibody variable domains while preserving their ligand-binding properties. Molecular Immunology. 1991;28(4-5):489–498. doi: 10.1016/0161-5890(91)90163-E. - DOI - PubMed
1. Roguska M, Pedersen J, Keddy C. Humanization of murine monoclonal antibodies through variable domain resurfacing. Proceedings of the National Academy of Sciences. 1994;91:969–973. doi: 10.1073/pnas.91.3.969. - DOI - PMC - PubMed
1. Clark M. Antibody humanization: a case of the 'Emperor's new clothes'? Immunol Today. 2000;21:397–402. doi: 10.1016/S0167-5699(00)01680-7. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

Affiliation

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources