Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 21;19(1):334.
doi: 10.1186/s12859-018-2368-y.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

Affiliations

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

Alperen Dalkiran et al. BMC Bioinformatics. .

Abstract

Background: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers.

Results: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study.

Conclusions: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .

Keywords: Benchmark datasets; EC numbers; Function prediction; Machine learning; Protein sequence.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Structure of an EC number classifier in ECPred
Fig. 2
Fig. 2
Flowchart of ECPred together with the prediction route of an example query protein. Query protein (PQ) received a score that is higher than the class specific positive cut-off value of main EC class 1.-.-.- (i.e., oxidoreductase) at Level 0–1 classification (Sm1 > Sc1); as a result, the query is only directed to the models for the subclasses of main class 1.-.-.-. Considering the subclass prediction (Level 2), PQ received a high score (Ss1.1 > Sc1.1) for EC 1.1.-.- (i.e., acting on the CH-OH group of donors) and further directed to the children sub-subclass EC numbers, where it received a high score (Sss1.1.2 > Sc1.1.2) for EC 1.1.2.- (i.e., with a cytochrome as acceptor) at Level 3, and another high score (Su1.1.2.4 > Sc1.1.2.4) for EC 1.1.2.4 (i.e., D-lactate dehydrogenase - cytochrome) at the substrate level (Level 4) and received the final prediction of EC 1.1.2.4
Fig. 3
Fig. 3
Positive and negative training dataset construction for EC class 1.1.-.-. Green colour indicates that the members of that class are used in the positive training dataset, grey colour indicates that the members of that class are used neither in the positive training dataset, nor in the negative training dataset and red colour indicates that the members of that class are used in the negative training dataset

References

    1. Cornish-Bowden A. Current IUBMB recommendations on enzyme nomenclature and kinetics. Perspect Sci. 2014;1:74–87. doi: 10.1016/j.pisc.2014.02.006. - DOI
    1. Nagao C, Nagano N, Mizuguchi K. Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PLoS One. 2014;9:1–12. - PMC - PubMed
    1. Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, et al. Prediction of human protein function from post-translational modifications and localization features. J Mol Biol. 2002;319:1257–1265. doi: 10.1016/S0022-2836(02)00379-0. - DOI - PubMed
    1. Diogo A, Latino RS, Aires-de-Sousa J. Assignment of EC numbers to enzymatic reactions with reaction difference fingerprints. PLoS One. 2009;7:1839–1846. - PMC - PubMed
    1. Qiu JD, Luo SH, Huang JH, Liang RP. Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. J Theor Biol. 2009;256:625–631. doi: 10.1016/j.jtbi.2008.10.026. - DOI - PubMed