. 2019 Oct 28;20(1):523.

doi: 10.1186/s12859-019-3137-2.

Domainoid: domain-oriented orthology inference

Emma Persson¹, Mateusz Kaduk¹, Sofia K Forslund^{2

3}, Erik L L Sonnhammer⁴

Affiliations

¹ Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden.
² Experimental and Clinical Research Cente, a joint cooperation of Max-Delbrück Center for Molecular Medicine and Charité-Universitätsmedizin Berlin, 13125, Berlin, Germany.
³ European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany.
⁴ Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden. erik.sonnhammer@scilifelab.se.

PMID: 31660857
PMCID: PMC6816169
DOI: 10.1186/s12859-019-3137-2

Domainoid: domain-oriented orthology inference

Emma Persson et al. BMC Bioinformatics. 2019.

. 2019 Oct 28;20(1):523.

doi: 10.1186/s12859-019-3137-2.

Authors

Emma Persson¹, Mateusz Kaduk¹, Sofia K Forslund^{2

3}, Erik L L Sonnhammer⁴

Affiliations

¹ Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden.
² Experimental and Clinical Research Cente, a joint cooperation of Max-Delbrück Center for Molecular Medicine and Charité-Universitätsmedizin Berlin, 13125, Berlin, Germany.
³ European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany.
⁴ Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden. erik.sonnhammer@scilifelab.se.

PMID: 31660857
PMCID: PMC6816169
DOI: 10.1186/s12859-019-3137-2

Abstract

Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

Availability: https://bitbucket.org/sonnhammergroup/domainoid/.

Keywords: Domain ortholog; Orthology; Protein domain.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Example of orthologs that may be missed by full-length approaches because of their domain architecture, but not by domain-based orthology inference. Orthologous domains are marked with double arrows. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively

**Fig. 2**
Example of a common scenario for discordant orthologies discovered by Domainoid, where the primary protein has more than two secondary proteins. The YjeF_N and Carb_kinase domains are both involved in NAD(P) H dehydration, where the former domain performs epimerization. The two domains have different evolutionary histories, causing discordant domain orthology as revealed by Domainoid. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively

**Fig. 3**
Number of orthologous pairs (left vertical axis) at different alpha thresholds (horizontal axis) when running Domainoid and InParanoid, and Jaccard index (right vertical axis) over consensus pairs at different alpha thresholds on species *Escherichia coli* and *Homo sapiens*. InParanoid-only and Domainoid-only pairs are uniquely found by the respective method

**Fig. 4**
Diagram showing the number of pairs of orthologs inferred for *Escherichia coli* and *Homo sapiens* by Domainoid, InParanoid, and their intersection. Furthermore, the Domainoid-only part is subdivided into three categories, depending on how Domainoid ortholog pairs map onto InParanoid results. “Two missing” means that neither protein in a pair was assigned to an ortholog group by InParanoid, “One missing” indicates that one of the proteins in the pair was not assigned by InParanoid, and “Conflicting groups” are pairs where proteins are assigned to other pairs in InParanoid

**Fig. 5**
A common scenario for orthologs found by Domainoid but missed by conventional InParanoid analysis involves short orthologous domains. In this example the orthologous domain, the chaperone DnaJ, is small relative to the whole protein. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively

**Fig. 6**
Example of orthologs with multiple orthologous domains identified by Domainoid, but missed by conventional InParanoid. These proteins share one of the functions of the trifunctional *Homo sapiens* protein, namely imidazole synthesis, revealed by orthology inference on a more fine-grained scale than on the full protein sequence level. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively

**Fig. 7**
Result of Domainoid compared to publically available methods in the Orthology benchmarking web service for the Generalized species tree discordance benchmark on the fungal subset of the QFO 2018 reference data. The number of completed tree samplings are represented on the horizontal axis and the average RF distance is represented on the vertical axis

**Fig. 8**
Overview of the Domainoid algorithm

**Fig. 9**
Visualization of how to calculate the alpha threshold. The number of orthologous domains (A and C) for the protein pair {X,Y} is 4, and the total number of domains for that pair is 5. The alpha value for the pair is thus $\frac{4}{5}$

See this image and copyright information in PMC

References

1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. doi: 10.2307/2412448. - DOI - PubMed
1. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C. Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol. 2012;8:e1002514. doi: 10.1371/journal.pcbi.1002514. - DOI - PMC - PubMed
1. Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 2015;43:D240–D249. doi: 10.1093/nar/gku1158. - DOI - PMC - PubMed
1. Jensen L. J., Julien P., Kuhn M., von Mering C., Muller J., Doerks T., Bork P. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Research. 2007;36(Database):D250–D254. doi: 10.1093/nar/gkm796. - DOI - PMC - PubMed
1. Sonnhammer ELL, Östlund G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res. 2015;43:D234–D239. doi: 10.1093/nar/gku1203. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Domainoid: domain-oriented orthology inference

Affiliations

Domainoid: domain-oriented orthology inference

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources