Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 20;3(2):lqab024.
doi: 10.1093/nargab/lqab024. eCollection 2021 Jun.

CladeOScope: functional interactions through the prism of clade-wise co-evolution

Affiliations

CladeOScope: functional interactions through the prism of clade-wise co-evolution

Tomer Tsaban et al. NAR Genom Bioinform. .

Abstract

Mapping co-evolved genes via phylogenetic profiling (PP) is a powerful approach to uncover functional interactions between genes and to associate them with pathways. Despite many successful endeavors, the understanding of co-evolutionary signals in eukaryotes remains partial. Our hypothesis is that 'Clades', branches of the tree of life (e.g. primates and mammals), encompass signals that cannot be detected by PP using all eukaryotes. As such, integrating information from different clades should reveal local co-evolution signals and improve function prediction. Accordingly, we analyzed 1028 genomes in 66 clades and demonstrated that the co-evolutionary signal was scattered across clades. We showed that functionally related genes are frequently co-evolved in only parts of the eukaryotic tree and that clades are complementary in detecting functional interactions within pathways. We examined the non-homologous end joining pathway and the UFM1 ubiquitin-like protein pathway and showed that both demonstrated distinguished co-evolution patterns in specific clades. Our research offers a different way to look at co-evolution across eukaryotes and points to the importance of modular co-evolution analysis. We developed the 'CladeOScope' PP method to integrate information from 16 clades across over 1000 eukaryotic genomes and is accessible via an easy to use web server at http://cladeoscope.cs.huji.ac.il.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
KEGG pathway prediction by clades. Heatmap demonstrating the performance of 17 clades (16 as defined in this study, as well as all eukaryotes) over 186 KEGG pathways. Each column depicts a clade while each row depicts a pathway. Each entry in the heatmap is colored by the percent of functional interactions in the pathway identified by the clade (see text). Dotted entries mark the best performing clade for each entry. The annotation bar ‘Ratio’ shows the fraction of clades that surpassed the score of all eukaryotes for each pathway (row).
Figure 2.
Figure 2.
Comparison of clade prediction of KEGG pathways. (A) Different clades surpass the all eukaryotes score in predicting KEGG pathways. This plot demonstrates how in many pathways each clade scored higher than all eukaryotes. (B) The fraction of KEGG pathways for which each clade had the best score out of all 66 examined. For panels (A) and (B), the x-axis shows 66 examined clades ranked by performance, and clades selected for our method are marked in blue. The y-axis depicts the ratio of KEGG pathways in which each clade scored higher than all eukaryotes in panel (A) and the ratio of KEGG pathways for which each clade was the top scoring in panel (B).
Figure 3.
Figure 3.
Clades are complementary in predicting functional interactions. Clades were used to predict functional interactions in KEGG pathways. (A) The recall for the top five clades per pathway (blue) compared to random gene sets (orange). The x-axis indicates the recall—i.e. the proportion of unique interactions identified (not identified by other clades). Histograms are ordered from top to bottom by the number of clades used for prediction. Vertical line is the mean of the distribution, with the value written above. (B) Performance of clades and groups of clades, ranked from best to worst per pathway, in predicting unique interactions (such that they are only predicted by a specific clade). All eukaryotes are shown for reference (light gray). For each rank, the proportion of unique connections (dark gray) and cumulative connections (purple) is shown. For each violin plot, the lines at the top and bottom are the min and max appropriately, while the black line in the middle is the mean. (C) Heatmap representing the percentage of pathways for which a specific clade (column) is ranked first to fifth (row).
Figure 4.
Figure 4.
The utility of using clade-wise PP as demonstrated on specific pathways. The network of interaction between pathway genes is shown for two pathways, KEGG NHEJ (AC) and KEGG glycosphingolipid biosynthesis globo series (DF). For each pathway, the network spanned by interactions found in all eukaryotes is shown on the left (A and D, in black), the network spanned by the top five clades is shown in the middle (B and E, edges colored by clade) and on the right the network spanned by the CladeOScope method (C and F, based on minimal rank over all clades for each interaction; edges colored by clade). Light gray represents the top clade in each example by the top five combination method. On the right, a color legend is included to highlight the clades used for identification of connections.
Figure 5.
Figure 5.
ROC curves for prediction of functional interactions. Prediction of functional interactions by the CladeOScope method was compared to the prediction using four other PP approaches—NPP with rank of correlation (NPP (rank)), NPP, Binarized PP with Hamming distance (BPP Hamming) and PrePhyloPro (PPP). The comparison was performed for predicting functional interactions (gene co-occurrence in KEGG pathways (A and D), CORUM complexes (B and E) and Reactome pathways (panels C and F)). Comparison is shown as ROC curves (A–C) with corresponding partial ROC curves where FPR < 0.1 (D–F, demarcated as dashed rectangle in A–C). TPR was adjusted for visibility. ROC: receiver operator characteristics; pROC: partial ROC; AUC: area under the curve; FPR: false positive rate; TPR: true positive rate.
Figure 6.
Figure 6.
CladeOScope results for UFM1 pathway genes UFC1 and UFL1. (A) CladeOScope results for UFC1 gene as obtained by the web tool. It is clear that the genes of the pathway show a pattern of co-evolution in both alveolates and all eukaryotes. Most of the genes of the pathway were detected within the top 15 ranks while a few were detected lower in ranks 20–72. Each row depicts a gene with known genes of the pathway colored in yellow. Each column stands for a clade in which the gene was inspected. Values in cells indicate the rank of a gene in a clade (lower is better, 1 is best). Ranks greater than 100 were omitted and presented as a blank cell. Genes are sorted by ascending rank on all eukaryotes. (B) Similar results were obtained for the gene UFL1 of the UFM1 pathway. This time the only clade detecting the rest of the partners was alveolates with 6/7 in the top 14 ranks, and 7/7 in rank 73. Genes are sorted by ascending rank in Alveolata. Clade (column) order is shared across (A) and (B). (C) Phylogenetic profiles of genes in the UFM1 pathway. Color scale depicts the relative signal as a min–max gene-wise scaled profile. The profiles are self-hit normalized bitscores as described in the ‘Materials and Methods’ section. The top bar annotation describes the clades to which each species (column) belongs. (D) An enlarged view of the Alveolata clade; row (gene) order is preserved across (C) and (D).

References

    1. Pellegrini M., Marcotte E.M., Thompson M.J., Eisenberg D., Yeates T.O.. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl Acad. Sci. U.S.A. 1999; 96:4285–4288. - PMC - PubMed
    1. Date S.V., Marcotte E.M.. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 2003; 21:1055–1062. - PubMed
    1. Tabach Y., Golan T., Hernández-Hernández A., Messer A.R., Fukuda T., Kouznetsova A., Liu J., Lilienthal I., Levy C., Ruvkun G.. Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling. Mol. Syst. Biol. 2013; 9:692. - PMC - PubMed
    1. Tabach Y., Billi A.C., Hayes G.D., Newman M.a, Zuk O., Gabel H., Kamath R., Yacoby K., Chapman B., Garcia S.M.et al.. Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence. Nature. 2013; 493:694–698. - PMC - PubMed
    1. Li Y., Calvo S.E., Gutman R., Liu J.S., Mootha V.K.. Expansion of biological pathways based on evolutionary inference. Cell. 2014; 158:213–225. - PMC - PubMed