Amyloidogenic motifs revealed by n-gram analysis
- PMID: 29021608
- PMCID: PMC5636826
- DOI: 10.1038/s41598-017-13210-9
Amyloidogenic motifs revealed by n-gram analysis
Abstract
Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis .
Conflict of interest statement
The authors declare that they have no competing interests.
Figures




Similar articles
-
FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids.BMC Bioinformatics. 2014 Feb 24;15:54. doi: 10.1186/1471-2105-15-54. BMC Bioinformatics. 2014. PMID: 24564523 Free PMC article.
-
Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data.Sci Rep. 2021 Apr 26;11(1):8934. doi: 10.1038/s41598-021-86530-6. Sci Rep. 2021. PMID: 33903613 Free PMC article.
-
Breaking the amyloidogenicity code: methods to predict amyloids from amino acid sequence.FEBS Lett. 2013 Apr 17;587(8):1089-95. doi: 10.1016/j.febslet.2012.12.006. Epub 2012 Dec 20. FEBS Lett. 2013. PMID: 23262221 Review.
-
Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.Protein Pept Lett. 2012 Sep;19(9):917-23. doi: 10.2174/092986612802084429. Protein Pept Lett. 2012. PMID: 22486618
-
Amyloid peptides and proteins in review.Rev Physiol Biochem Pharmacol. 2007;159:1-77. doi: 10.1007/112_2007_0701. Rev Physiol Biochem Pharmacol. 2007. PMID: 17846922 Review.
Cited by
-
Pathologic polyglutamine aggregation begins with a self-poisoning polymer crystal.Elife. 2023 Nov 3;12:RP86939. doi: 10.7554/eLife.86939. Elife. 2023. PMID: 37921648 Free PMC article.
-
AB-Amy: machine learning aided amyloidogenic risk prediction of therapeutic antibody light chains.Antib Ther. 2023 Apr 12;6(3):147-156. doi: 10.1093/abt/tbad007. eCollection 2023 Jul. Antib Ther. 2023. PMID: 37492587 Free PMC article.
-
Aggrescan4D: structure-informed analysis of pH-dependent protein aggregation.Nucleic Acids Res. 2024 Jul 5;52(W1):W170-W175. doi: 10.1093/nar/gkae382. Nucleic Acids Res. 2024. PMID: 38738618 Free PMC article.
-
A conserved motif in Henipavirus P/V/W proteins drives the fibrillation of the W protein from Hendra virus.Protein Sci. 2025 Apr;34(4):e70085. doi: 10.1002/pro.70085. Protein Sci. 2025. PMID: 40100133 Free PMC article.
-
Evolution as a Guide to Designing xeno Amino Acid Alphabets.Int J Mol Sci. 2021 Mar 10;22(6):2787. doi: 10.3390/ijms22062787. Int J Mol Sci. 2021. PMID: 33801827 Free PMC article. Review.
References
-
- Chaturvedi, S. K., Siddiqi, M. K., Alam, P. & Khan, R. H. Protein misfolding and aggregation: Mechanism, factors and detection. Process. Biochem. 51(9), 1183–1192 (2016).
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources