Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb 1;27(3):1457.
doi: 10.3390/ijms27031457.

CladeOScope-GSA: Revealing Evolutionary Associations Across Gene Sets

Affiliations

CladeOScope-GSA: Revealing Evolutionary Associations Across Gene Sets

Maya Braun et al. Int J Mol Sci. .

Abstract

Deciphering gene and protein functions and interactions remains a core challenge in biology and medicine. Gene set analysis and multi-omics tools are widely used to interpret gene lists; however, they often overlook shared evolutionary patterns among genes. These conservation and loss patterns, shaped by billions of years of evolutionary pressure, can uncover co-evolutionary signals within gene sets, yet they remain frequently underexplored. In this study, we apply normalized phylogenetic profiling (NPP) across 1905 eukaryotic species and introduce CladeOScope-GSA, a tool for analyzing user-defined gene sets. CladeOScope-GSA uncovers common signatures of conservation, revealing whether a gene set evolves as a cohesive unit or as distinct co-evolving submodules. By tracing gene set origins, diversification, and shared evolutionary histories, the tool identifies the structural organization and key components of gene networks, exposing functional similarities, phenotypic associations, and broader biological relationships. We demonstrate its utility through two well-characterized cases: the porphyria-related pathway and the dynein gene family. In both, CladeOScope-GSA recapitulates known functional substructures and uncovers previously unrecognized evolutionary insights, underscoring its value for advancing our understanding of gene function and pathway evolution on a broad scale.

Keywords: comparative genomics; functional genomics; gene set analysis; phylogenetic profiling.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
CladeOScope-GSA analysis workflow. Starting from a user-uploaded human gene set (HGNC symbols) and selected clades, CladeOScope-GSA maps genes to a precomputed normalized phylogenetic profiling (NPP) matrix (19,888 human genes × 1905 eukaryotic species), filters non-conserved genes, and retrieves clade-wise length-normalized phylogenetic profiles (LNPP). It then computes pairwise Pearson correlations across species to construct a correlation matrix, evaluates the co-evolutionary significance of the gene set by comparing threshold and cluster scores to 1000 size-matched random gene sets per clade, and applies hierarchical clustering and network construction (edges with correlation ≥ 0.7) to identify co-evolving clusters. The final outputs are heatmaps, clade-wise significance plots, co-evolution networks, and downloadable tables summarizing conservation and co-evolution metrics.
Figure 2
Figure 2
Evolutionary signatures of porphyria-related genes across 1905 eukaryotic species. (A) Heatmap showing the normalized phylogenetic profiles of nine heme-biosynthesis genes across the full species panel. Each row corresponds to a human query gene, and each column represents a species. Darker shades indicate higher conservation of the gene’s ortholog, whereas lighter shades denote reduced similarity or absence of an ortholog. (B) demonstrates how significantly co-evolved the query gene set is in eukaryotes, ecdysozoa, and fungi, in comparison to thousands of random sets of the same size, and to the Krebs cycle gene set. Each dot represents a set of random genes, colored by the clade in which the set was examined. The “#” symbol denotes the number of genes that are conserved in the corresponding clade and included in the analysis. Lastly, (C) the co-evolution network depicts pairwise gene-gene correlations above the defined threshold. Edge colors correspond to the clades displayed in the heatmap, indicating the clade(s) in which each gene pair shows significant co-evolution.
Figure 3
Figure 3
Clade-wise conservation patterns of the dynein gene family reveal a functional subgroup structure. (A) Heatmap of normalized phylogenetic profiles for human dynein genes across 1905 species. Rows represent individual dynein genes and columns species. Color intensity corresponds to the conservation level. The bar adjacent to the heatmap denotes each gene’s structural classification (e.g., cytoplasmic vs. axonemal). (B) illustrates the extent to which the query gene set is significantly co-evolved in eukaryotes, ecdysozoa, fungi, and plants, compared to thousands of randomly generated gene sets of the same size and to the Krebs cycle gene set. Each dot represents a random gene set, colored according to the clade in which it was analyzed. The “#” symbol represents the number of conserved genes within each clade. (C) The co-evolution network visualizes significant pairwise gene-gene correlations above the defined threshold. Edge colors indicate the clades shown in the heatmap, specifying the clade(s) in which significant co-evolution is observed for each gene pair.

References

    1. Wishart D.S., Li C., Marcu A., Badran H., Pon A., Budinski Z., Patron J., Lipton D., Cao X., Oler E., et al. PathBank: A comprehensive pathway database for model organisms. Nucleic Acids Res. 2020;48:D470–D478. doi: 10.1093/nar/gkz861. - DOI - PMC - PubMed
    1. Kanehisa M., Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. - DOI - PMC - PubMed
    1. Croft D., Mundo A.F., Haw R., Milacic M., Weiser J., Wu G., Caudy M., Garapati P., Gillespie M., Kamdar M.R., et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42:D472–D477. doi: 10.1093/nar/gkt1102. - DOI - PMC - PubMed
    1. Langfelder P., Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559. doi: 10.1186/1471-2105-9-559. - DOI - PMC - PubMed
    1. Ho J.W.K., Charleston M.A. Network modelling of gene regulation. Biophys. Rev. 2011;3:1–9. doi: 10.1007/s12551-010-0041-4. - DOI - PMC - PubMed

LinkOut - more resources