Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 10;53(7):gkaf320.
doi: 10.1093/nar/gkaf320.

SOI: robust identification of orthologous synteny with the Orthology Index and broad applications in evolutionary genomics

Affiliations

SOI: robust identification of orthologous synteny with the Orthology Index and broad applications in evolutionary genomics

Ren-Gang Zhang et al. Nucleic Acids Res. .

Abstract

With the explosive growth of whole-genome datasets, accurate detection of orthologous synteny has become crucial for reconstructing evolutionary history. However, current methods for identifying orthologous synteny face great limitations, particularly in scaling with varied polyploidy histories and accurately removing out-paralogous synteny. In this study, we developed a scalable and robust approach, based on the Orthology Index (OI), to effectively identify orthologous synteny. Our evaluation across a large-scale empirical dataset with diverse polyploidization events demonstrated the high reliability and robustness of the OI method. Simulation-based benchmarks further validated the accuracy of our method, showing its superior performance against existing methods across a wide range of scenarios. Additionally, we explored its broad applications in reconstructing the evolutionary histories of plant genomes, including the inference of polyploidy, identification of reticulation, and phylogenomics. In conclusion, OI offers a robust, interpretable, and scalable approach for identifying orthologous synteny, facilitating more accurate and efficient analyses in plant evolutionary genomics.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Results from using the OI to identify orthologous synteny for the typical Salicaceae case. (A) Schematic representation of the evolutionary history of the poplar and willow genomes, adapted from the literature [45–48]. (B) KS-colored dot plots showing synteny detected by WGDI (-icl), with an observable distinction between the three categories of SBs derived from three evolutionary events (three peaks: KS≈ 1.5, KS≈ 0.27, and KS≈ 0.13). (C) KS-colored dot plots illustrating the orthology as inferred by OrthoFinder2, with an observable high proportion (∼15%) of hidden out-paralogs (KS≈ 0.27). (D) OI-colored dot plots: integrating synteny (B) and orthology (C), showing polarized distinction of the three categories of SBs (three peaks: OI≈ 0, OI≈ 0.1, and OI≈ 0.9). (E) KS-colored dot plots of synteny after applying an OI cutoff of 0.6, showing clean one-to-one orthology as expected from the evolutionary history. Panels (B)–(E) are plotted using the ‘dotplot’ subcommand with four subplots: (a) dot plots colored by KS or OI (x-axis and y-axis, chromosomes of the two genomes; a dot indicates a homologous gene pair between the two genomes); (b) histogram of KS or OI (x-axis, KS or OI; y-axis, number of homologous gene pairs), using the same color map as the dot plots; and (c) and (d) synteny depth (indicative of relative ploidy) across 50-gene windows (x-axis, synteny depth; y-axis, number of windows), relative to the genomes on the x-axis (c) or y-axis (d) of subplot a. Examples of the SBs from three evolutionary events [referred to as WGT-SBs (KS≈ 1.5, OI≈ 0), WGD-SBs (KS≈ 0.27, OI≈ 0.1), and S-SBs (KS≈ 0.13, OI≈ 0.9)] are highlighted with dashed squares. These are associated with the evolutionary events and peaks of KS or OI, indicated by arrows, and labeled as ‘Out-paralogy’ or ‘Orthology’. Additional cases illustrating other lineages can be found in Supplementary Figs S1–S90 (summarized in Supplementary Table S1).
Figure 2.
Figure 2.
The performance of the OI in identifying orthologous synteny in empirical and simulated datasets. (A) Summary of OI distributions in the 91 empirical test cases. The black line and gray shadow represent the median and percentile-based 95% confidence interval (CI) values, respectively. (B) The correlation between ΔT and the noise around OI= 0.5 in the empirical datasets. The ΔT values of empirical datasets were estimated from KS peak values of shared WGD and speciation events. The noise is defined as the cumulative proportion of syntenic gene pairs falling within the OI intervals of 0.3–0.4, 0.4–0.5, 0.5–0.6, and 0.6–0.7. These putative noises are generally unexpected and likely arise from false orthology inference. R is the Pearson’s correlation coefficient. (C) Comparisons of inter-genomic KS and OI distributions in the simulated datasets at different ΔT settings (ΔT ε {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1}, measured in substitutions per site). The line/point and shadow represent the median and 95% CI values from 50 repeated simulations, respectively. (D) Comparisons of precision, recall and F1 scores of orthology identification using different OI cutoffs (0.05–0.95) in simulated benchmarks. The line/point and shadow represent the median and 95% CI values from 50 repeated simulations, respectively. (E) Comparisons of precision, recall and F1 score of orthology identification using WGDI (-icl option), OrthoFinder2 and OI with cutoffs of 0.5 and 0.6, based on the simulated benchmarks. The boxplot represents the values from 50 repeated simulations. ns, P > .05; ***P ≤ .001; ****P ≤ .0001; Kruskal-Wallis test. (F) Comparisons of OI and other tools for identifying orthologous synteny, based on the simulated benchmarks on varied levels of ΔT (0.01–1) and chromosome evolution parameters (fold ε {1, 10, 100, 1000}). The line and point represent the median values of precision, recall or F1 score, and the error bar indicates the 95% CI from 50 repeated simulations. See also Supplementary Fig. S92 for additional information.
Figure 3.
Figure 3.
Inference of polyploidy in Apiales (Apiaceae + Araliaceae) genomes using the OI. (A) Schematic illustration of determining shared or lineage-specific WGD(s) hypotheses using the orthologous synteny patterns identified by the OI. Despite a similar 2:2 ratio of synteny depth, the two scenarios have distinct patterns of orthologous synteny (1:1 orthology vs. 2:2 orthology). Labels A and B indicate two species, and A1, A2 and B1, B2 indicate duplicated chromosomes or blocks from WGD event(s). (B) OI-colored dot plots indicating orthologous and out-paralogous synteny between the genomes of C. asiatica (Apiaceae) and A. elata (Araliaceae). A typical 1:1 orthology + 1:1 out-paralogy synteny pattern is highlighted by dashed squares. (C) Phylogeny reconstructed from the genomes of certain species in the Apiaceae and the Araliaceae (Apiales), with labels indicating polyploidization events. L1 and L2 represent the average branch length (substitution rate) of the Apiaceae and the Araliaceae, respectively, from their MRCA. Numbers at the nodes denote bootstrap values. The maximum-likelihood phylogenetic tree was reconstructed using IQ-TREE2, based on concatenated codon alignments of 2363 single-copy genes (with at most 20% taxa missing). Additional evidence supporting the inferred polyploidization events can be found in Supplementary Figs S93–S99.
Figure 4.
Figure 4.
Examples of reticulation inferences based on the OI. (A, B) OI-colored dot plots (A) of the genomes of A. thaliana + A. arenosa and their hybrid, with the inference (B; in dendrogram form) based on the orthologous relationships. (C, D) OI-colored dot plots (C) of the genomes of the tretaploid A. hypogaea and its diploid progenitors, and the inference (D; in dendrogram form) based on the orthologous relationships. (E, F) OI-colored dot plots (E) of the genomes of the hexaploid T. aestivum and its intermediate tretaploid T. turgidum, with the inference (F; in dendrogram form) based on the orthologous relationships. (G, H) OI-colored dot plots (G) of the genomes of the neo-octoploid P. setigerum and its intermediate tretaploid P. somniferum, with the inference (H; in dendrogram form) based on the orthologous relationships. Only one set of representative homoeologous chromosomes is shown in the dot plots; dot plots with the full set of chromosomes can be found in Supplementary Figs S100–S103.
Figure 5.
Figure 5.
An example (core eudicots) of phylogenomics based on the OI. (A) The number of multi-copy and single-copy SOGs with different taxon occupancy. (B) The occupancy of SOGs in species with different relative ploidy (i.e. orthologous syntenic depth relative to the V. vinifera genome), allowing up to 40% taxa missing. Each point represents one species. ns, P > .05; **, P ≤ .01; ****, P ≤ .0001; Wilcoxon test. (C) Comparison of phylogenetic relationships within the core eudicots reconstructed in this study versus those in APG IV. Conflicting positions are marked in red; unresolved relationships in APG IV are marked in green, and orders not covered in this study are marked in blue. The numbers at the nodes are posterior probabilities from ASTRAL, with the black representing values from the multi-copy SOGs and orange representing values from the single-copy SOGs (omitted for equal values). Further details of the two trees reconstructed in this study can be found in Supplementary Figs S104 and S105.

Similar articles

Cited by

References

    1. Steenwyk JL, King N The promise and pitfalls of synteny in phylogenomics. PLoS Biol. 2024; 22:e3002632.10.1371/journal.pbio.3002632. - DOI - PMC - PubMed
    1. Liu D, Hunt M, Tsai IJ Inferring synteny between genome assemblies: a systematic evaluation. BMC Bioinformatics. 2018; 19:26.10.1186/s12859-018-2026-4. - DOI - PMC - PubMed
    1. Passarge E, Horsthemke B, Farber RA Incorrect use of the term synteny. Nat Genet. 1999; 23:387.10.1038/70486. - DOI - PubMed
    1. Kristensen DM, Wolf YI, Mushegian AR et al. . Computational methods for gene orthology inference. Brief Bioinform. 2011; 12:379–91.10.1093/bib/bbr030. - DOI - PMC - PubMed
    1. Sonnhammer ELL, Koonin EV Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002; 18:619–20.10.1016/S0168-9525(02)02793-2. - DOI - PubMed

LinkOut - more resources