Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 28;20(1):523.
doi: 10.1186/s12859-019-3137-2.

Domainoid: domain-oriented orthology inference

Affiliations

Domainoid: domain-oriented orthology inference

Emma Persson et al. BMC Bioinformatics. .

Abstract

Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

Availability: https://bitbucket.org/sonnhammergroup/domainoid/.

Keywords: Domain ortholog; Orthology; Protein domain.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Example of orthologs that may be missed by full-length approaches because of their domain architecture, but not by domain-based orthology inference. Orthologous domains are marked with double arrows. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively
Fig. 2
Fig. 2
Example of a common scenario for discordant orthologies discovered by Domainoid, where the primary protein has more than two secondary proteins. The YjeF_N and Carb_kinase domains are both involved in NAD(P) H dehydration, where the former domain performs epimerization. The two domains have different evolutionary histories, causing discordant domain orthology as revealed by Domainoid. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively
Fig. 3
Fig. 3
Number of orthologous pairs (left vertical axis) at different alpha thresholds (horizontal axis) when running Domainoid and InParanoid, and Jaccard index (right vertical axis) over consensus pairs at different alpha thresholds on species Escherichia coli and Homo sapiens. InParanoid-only and Domainoid-only pairs are uniquely found by the respective method
Fig. 4
Fig. 4
Diagram showing the number of pairs of orthologs inferred for Escherichia coli and Homo sapiens by Domainoid, InParanoid, and their intersection. Furthermore, the Domainoid-only part is subdivided into three categories, depending on how Domainoid ortholog pairs map onto InParanoid results. “Two missing” means that neither protein in a pair was assigned to an ortholog group by InParanoid, “One missing” indicates that one of the proteins in the pair was not assigned by InParanoid, and “Conflicting groups” are pairs where proteins are assigned to other pairs in InParanoid
Fig. 5
Fig. 5
A common scenario for orthologs found by Domainoid but missed by conventional InParanoid analysis involves short orthologous domains. In this example the orthologous domain, the chaperone DnaJ, is small relative to the whole protein. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively
Fig. 6
Fig. 6
Example of orthologs with multiple orthologous domains identified by Domainoid, but missed by conventional InParanoid. These proteins share one of the functions of the trifunctional Homo sapiens protein, namely imidazole synthesis, revealed by orthology inference on a more fine-grained scale than on the full protein sequence level. Identifiers are UniProt accessions and Pfam domains (shown as coloured boxes) respectively
Fig. 7
Fig. 7
Result of Domainoid compared to publically available methods in the Orthology benchmarking web service for the Generalized species tree discordance benchmark on the fungal subset of the QFO 2018 reference data. The number of completed tree samplings are represented on the horizontal axis and the average RF distance is represented on the vertical axis
Fig. 8
Fig. 8
Overview of the Domainoid algorithm
Fig. 9
Fig. 9
Visualization of how to calculate the alpha threshold. The number of orthologous domains (A and C) for the protein pair {X,Y} is 4, and the total number of domains for that pair is 5. The alpha value for the pair is thus 45

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. doi: 10.2307/2412448. - DOI - PubMed
    1. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C. Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol. 2012;8:e1002514. doi: 10.1371/journal.pcbi.1002514. - DOI - PMC - PubMed
    1. Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 2015;43:D240–D249. doi: 10.1093/nar/gku1158. - DOI - PMC - PubMed
    1. Jensen L. J., Julien P., Kuhn M., von Mering C., Muller J., Doerks T., Bork P. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Research. 2007;36(Database):D250–D254. doi: 10.1093/nar/gkm796. - DOI - PMC - PubMed
    1. Sonnhammer ELL, Östlund G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res. 2015;43:D234–D239. doi: 10.1093/nar/gku1203. - DOI - PMC - PubMed

LinkOut - more resources