Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 13;18(1):107.
doi: 10.1186/s12859-017-1519-x.

ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

Affiliations

ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

Seyed Ziaeddin Alborzi et al. BMC Bioinformatics. .

Abstract

Background: Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.

Results: This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.

Conclusion: These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .

Keywords: Content-based filtering; Enzyme commission number; Pfam domain; Protein domain; Protein function.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
a) One domain provides one enzyme function; (b) two domains on the same chain each provide a different enzyme function; (c) one domain provides two different enzyme functions; (d) one domain provides one enzyme function, while a second domain acts as a co-factor with the first domain to provide an additional enzyme function
Fig. 2
Fig. 2
A graphical illustration of calculating raw EC-Pfam association scores from existing SIFTS EC-CID and Pfam-CID associations
Fig. 3
Fig. 3
Scale-up factors for ECDomainMiner compared with InterPro. Ratios between the numbers in ECDomainMiner and in Interpro have been calculated for associations (red), EC numbers (yellow), and Pfam domains (green) after dividing the dataset according to each EC branch represented in the associations (1 to 6) and for all the dataset (All). 1: oxydoreductases; 2: transferases; 3: hydrolases; 4: lyases; 5: isomerases; 6: ligases
Fig. 4
Fig. 4
Venn diagram showing the intersection between a Pfam2EC (2500 associations) from dcGO, b All-Merged (262,571 associations), and c ECDomainMiner (20,728 associations). Region I (480 associations) is the portion of (a) for which there is no data in any of our four source datasets. Region II (128 associations) is the portion of (a) that exists in (b) but is not retained in ECDomainMiner (c). Region III (1892 associations) is the overlap between (a) and (c). Region IV (18,836 associations) is the portion of ECDomainMiner associations that are not available from SCOP2EC. Region V (241,363 associations) is the rest of the merged set of EC-Pfam source associations that are absent from (a) and not retained as Gold, Silver, or Bronze associations by ECDomainMiner
Fig. 5
Fig. 5
Distribution of EC numbers (a) and Pfam domains (b) in multiple associations. Numbers (1 to 10 and >10) represent the arity of the association in which a given EC number, respectively Pfam domain, is involved. In addition, for each arity, the normalized number of Gold, Silver, and Bronze associations is plotted. It can be observed that for arities equal to or greater than 4, the proportion of Silver associations is always the highest but significant numbers of Gold associations remain present even for high arity numbers

Similar articles

Cited by

References

    1. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res. 2014;42(D1):222–30. doi: 10.1093/nar/gkt1223. - DOI - PMC - PubMed
    1. Berg JM, Tymoczko JL, Stryer L. Protein structure and function. New York: WH Freeman; 2002.
    1. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5(4):823. - PMC - PubMed
    1. Martin ACR, Orengo CA, Hutchinson EG, Jones S, Karmirantzou M, Laskowski RA, Mitchell JBO, Taroni C, Thornton JM. Protein folds and functions. Structure. 1998;6(7):875–84. doi: 10.1016/S0969-2126(98)00089-6. - DOI - PubMed
    1. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The protein data bank. Eur J Biochem. 1977;80(2):319–24. doi: 10.1111/j.1432-1033.1977.tb11885.x. - DOI - PubMed

LinkOut - more resources