. 2020 Oct 19;21(1):466.

doi: 10.1186/s12859-020-03794-x.

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study

Swati Sinha¹, Andrew M Lynn², Dhwani K Desai^{3

4}

Affiliations

¹ Bioinformatics Institute, Agency for Science, Technology, and Research (A*Star), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.
² School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
³ Department of Biology and Department of Pharmacology, Dalhousie University, Halifax, NS, B3H4R2, Canada. dhwani.desai@dal.ca.
⁴ School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India. dhwani.desai@dal.ca.

PMID: 33076816
PMCID: PMC7574302
DOI: 10.1186/s12859-020-03794-x

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study

Swati Sinha et al. BMC Bioinformatics. 2020.

. 2020 Oct 19;21(1):466.

doi: 10.1186/s12859-020-03794-x.

Authors

Swati Sinha¹, Andrew M Lynn², Dhwani K Desai^{3

4}

Affiliations

¹ Bioinformatics Institute, Agency for Science, Technology, and Research (A*Star), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.
² School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
³ Department of Biology and Department of Pharmacology, Dalhousie University, Halifax, NS, B3H4R2, Canada. dhwani.desai@dal.ca.
⁴ School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India. dhwani.desai@dal.ca.

PMID: 33076816
PMCID: PMC7574302
DOI: 10.1186/s12859-020-03794-x

Abstract

Background: Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes ('Hole finding protocol') coupled with the identification of candidate proteins for the predicted orphan enzyme ('Hole filling protocol'). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function.

Results: The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using 'Hole finding protocol'. The 'Hole-filling protocol' was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes.

Conclusions: We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.

Keywords: Chokepoints; Genome context-based annotation; Global hole; Homology based method; Local hole; Missing enzyme; ModEnzA; Non-homology based methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Stepwise systematic approach for the implementation of homology-based and non-homology based computational methods

**Fig. 2**
Schematic representation of the workflow to map novel enzymes in *M. tb* using ModEnzA profiles. The ModEnzA enzymes profiles were built with the 31 January 2018 release of the ENZYME database. Both Uniprot-KB/Swiss-Prot and UniProtKB/TrEMBL were used as the sequence search space to scan for novel *M. tb* enzymes

**Fig. 3**
Schematic representation of the ‘Hole Finding Protocol’ to identify local holes and chokepoints in an organism. The figure shows a flowchart of the workflow for the identification of 'local holes' and ‘chokepoints’ in an organism using an enzyme–enzyme dependency graph of all known metabolic reactions. ModEnzA [14] is a profile HMM-based method used to scan the proteome of a given organism for the accurate classification of its enzymes

**Fig. 4**
Mapping of known and predicted enzymes in *M. tb* on to KEGG pathways. a–c The mappings of some of these enzymes on the Porphyrin and Chlorophyll metabolism, Drug metabolism—other enzymes, and Glycolysis/Gluconeogenesis KEGG pathways respectively. The enzymes already annotated in *M. tb* are shown in Red, the enzymes predicted by the homology-based method ModEnzA are shown in Blue, the local holes in Green, the high-confidence local holes in Brown while the choke points are depicted in Yellow

**Fig. 5**
Schematic representation of a metabolic hole and candidate protein set. The figure shows a an unknown protein (?) surrounded by known neighbors n1–n6 and b a set of candidate proteins from the target organism which is its entire proteome except for the known neighboring proteins. For each neighbor, we find its interaction with all the candidate proteins. If protein P1 has an interaction score with n1, n2, and n4, then we combine these scores in a naive Bayes manner using Bayesian score integration as shown in the equation (see "Methods" ). All the candidate proteins with their respective scores are then sorted and the one with the highest score qualifies to perform the desired function

**Fig. 6**
Comparison of the self-rank thresholds after in-silico enzyme knockouts. a The figure shows the performance of the ‘Hole Filling protocol’ on the *E. coli* genome (shown in the blue-colored curve) where the combined scores of functional associations from STRING were used to get the new functional association score. *Reference values for individual and combined association scores were digitized from Fig. 4 of Kharchenko et al. [28] for comparison. b Similar knockouts were performed for all the metabolic proteins from eight other genomes, *Saccharomyces cerevisiae (sce), Dictyostelium discoideum (ddi), Arabidopsis thaliana (ath), Drosophila melanogaster (dme), Danio rerio (dre), Salmonella enterica (sen), Shigella flexneri (sfl)* and *Vibrio cholerae (vch)*

See this image and copyright information in PMC

Cited by

Molecular Insight into Mycobacterium tuberculosis Resistance to Nitrofuranyl Amides Gained through Metagenomics-like Analysis of Spontaneous Mutants.
Mokrousov I, Slavchev I, Solovieva N, Dogonadze M, Vyazovaya A, Valcheva V, Masharsky A, Belopolskaya O, Dimitrov S, Zhuravlev V, Portugal I, Perdigão J, Dobrikov GM. Mokrousov I, et al. Pharmaceuticals (Basel). 2022 Sep 12;15(9):1136. doi: 10.3390/ph15091136. Pharmaceuticals (Basel). 2022. PMID: 36145357 Free PMC article.
Functional prediction of proteins from the human gut archaeome.
Novikova PV, Bhanu Busi S, Probst AJ, May P, Wilmes P. Novikova PV, et al. ISME Commun. 2024 Jan 10;4(1):ycad014. doi: 10.1093/ismeco/ycad014. eCollection 2024 Jan. ISME Commun. 2024. PMID: 38486809 Free PMC article.
Identification of a novel gene required for competitive growth at high temperature in the thermotolerant yeast Kluyveromyces marxianus.
Montini N, Doughty TW, Domenzain I, Fenton DA, Baranov PV, Harrington R, Nielsen J, Siewers V, Morrissey JP. Montini N, et al. Microbiology (Reading). 2022 Mar;168(3):001148. doi: 10.1099/mic.0.001148. Microbiology (Reading). 2022. PMID: 35333706 Free PMC article.
An informatic workflow for the enhanced annotation of excretory/secretory proteins of Haemonchus contortus.
Zheng Y, Young ND, Song J, Chang BCH, Gasser RB. Zheng Y, et al. Comput Struct Biotechnol J. 2023 Mar 18;21:2696-2704. doi: 10.1016/j.csbj.2023.03.025. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 37143762 Free PMC article.

References

1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
1. Eddy SR. A new generation of homology search tools based on probabilistic inference. Genome Inform. 2009;23:205–211. - PubMed
1. Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21:951–960. doi: 10.1093/bioinformatics/bti125. - DOI - PubMed
1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
1. Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

F.No.10-2(5)/2003(II)-E.U.II/Council of Scientific and Industrial Research, India

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study

Affiliations

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources