ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains
- PMID: 28193156
- PMCID: PMC5307852
- DOI: 10.1186/s12859-017-1519-x
ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains
Abstract
Background: Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
Results: This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.
Conclusion: These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .
Keywords: Content-based filtering; Enzyme commission number; Pfam domain; Protein domain; Protein function.
Figures





Similar articles
-
Identifying protein domains with the Pfam database.Curr Protoc Bioinformatics. 2008 Sep;Chapter 2:2.5.1-2.5.17. doi: 10.1002/0471250953.bi0205s23. Curr Protoc Bioinformatics. 2008. PMID: 18819075
-
Pfam: The protein families database in 2021.Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913. Nucleic Acids Res. 2021. PMID: 33125078 Free PMC article.
-
The Pfam protein families database.Nucleic Acids Res. 2002 Jan 1;30(1):276-80. doi: 10.1093/nar/30.1.276. Nucleic Acids Res. 2002. PMID: 11752314 Free PMC article.
-
Selection of soluble protein expression constructs: the experimental determination of protein domain boundaries.Biochem Soc Trans. 2010 Aug;38(4):908-13. doi: 10.1042/BST0380908. Biochem Soc Trans. 2010. PMID: 20658975 Review.
-
A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.Curr Drug Targets. 2019;20(5):540-550. doi: 10.2174/1389450119666181002143355. Curr Drug Targets. 2019. PMID: 30277150 Review.
Cited by
-
iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures.PLoS Comput Biol. 2023 Feb 9;19(2):e1010462. doi: 10.1371/journal.pcbi.1010462. eCollection 2023 Feb. PLoS Comput Biol. 2023. PMID: 36758069 Free PMC article.
-
Co-occurrence of enzyme domains guides the discovery of an oxazolone synthetase.Nat Chem Biol. 2021 Jul;17(7):794-799. doi: 10.1038/s41589-021-00808-4. Epub 2021 Jun 7. Nat Chem Biol. 2021. PMID: 34099916 Free PMC article.
-
Propionate Fermentative Genes of the Gut Microbiome Decrease in Inflammatory Bowel Disease.J Clin Med. 2021 May 18;10(10):2176. doi: 10.3390/jcm10102176. J Clin Med. 2021. PMID: 34070019 Free PMC article.
-
Approaching Optimal pH Enzyme Prediction with Large Language Models.ACS Synth Biol. 2024 Sep 20;13(9):3013-3021. doi: 10.1021/acssynbio.4c00465. Epub 2024 Aug 28. ACS Synth Biol. 2024. PMID: 39197156 Free PMC article.
-
A roadmap for metagenomic enzyme discovery.Nat Prod Rep. 2021 Nov 17;38(11):1994-2023. doi: 10.1039/d1np00006c. Nat Prod Rep. 2021. PMID: 34821235 Free PMC article. Review.
References
-
- Berg JM, Tymoczko JL, Stryer L. Protein structure and function. New York: WH Freeman; 2002.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources