Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 1;23(1):517.
doi: 10.1186/s12859-022-05070-6.

ENTAIL: yEt aNoTher amyloid fIbrils cLassifier

Affiliations

ENTAIL: yEt aNoTher amyloid fIbrils cLassifier

Alessia Auriemma Citarella et al. BMC Bioinformatics. .

Abstract

Background: This research aims to increase our knowledge of amyloidoses. These disorders cause incorrect protein folding, affecting protein functionality (on structure). Fibrillar deposits are the basis of some wellknown diseases, such as Alzheimer, Creutzfeldt-Jakob diseases and type II diabetes. For many of these amyloid proteins, the relative precursors are known. Discovering new protein precursors involved in forming amyloid fibril deposits would improve understanding the pathological processes of amyloidoses.

Results: A new classifier, called ENTAIL, was developed using over than 4000 molecular descriptors. ENTAIL was based on the Naive Bayes Classifier with Unbounded Support and Gaussian Kernel Type, with an accuracy on the test set of 81.80%, SN of 100%, SP of 63.63% and an MCC of 0.683 on a balanced dataset.

Conclusions: The analysis carried out has demonstrated how, despite the various configurations of the tests, performances are superior in terms of performance on a balanced dataset.

Keywords: Amyloidoses; Fibrils machine learning; Protein classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The composition of the aggregated dataset based on the length of the sequences
Fig. 2
Fig. 2
Comparison of the experiments and their configuration in the test phase
Fig. 3
Fig. 3
Confusion matrices for the best experiments. From left to right: experiment 2, experiment 5, and experiment 8
Fig. 4
Fig. 4
Roc Curves for the best experiments. From left to right: experiment 2, experiment 5, and experiment 8

References

    1. Citarella AA, Marco FD, Biasi LD, Risi M, Tortora G. Gene ontology terms visualization with dynamic distance-graph and similarity measures (S). In: Chang S, editor. The 27th international DMS conference on visualization and visual languages, DMSVIVA 2021, KSIR Virtual Conference Center, USA, 2021, KSI Research Inc.; 2021. pp. 85–91. 10.18293/DMSVIVA21-013
    1. Citarella AA, Marco FD, Biasi LD, Risi M, Tortora G. PADD: dynamic distance-graph based on similarity measures for GO terms visualization of Alzheimer and Parkinson diseases. J Vis Lang Comput. 2021;2021(1):19–28. doi: 10.18293/JVLC2021-N1-013. - DOI
    1. Allen G. Sequencing of proteins and peptides. Work TS, Burdon R, editors (1981)
    1. Citarella AA, Porcelli L, Di Biasi L, Risi M, Tortora G. Reconstruction and visualization of protein structures by exploiting bidirectional neural networks and discrete classes. In: 2021 25th international conference information visualisation (IV), 2021. pp. 285–290. 10.1109/IV53921.2021.00053. IEEE
    1. Soto C. Protein misfolding and disease; protein refolding and therapy. FEBS lett. 2001;498(2–3):204–207. doi: 10.1016/S0014-5793(01)02486-3. - DOI - PubMed