Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 7;20(1):10.
doi: 10.1186/s13015-025-00279-5.

Reconstructing rearrangement phylogenies of natural genomes

Affiliations

Reconstructing rearrangement phylogenies of natural genomes

Leonard Bohnenkämper et al. Algorithms Mol Biol. .

Abstract

Background: We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success.

Methods: Building on prior work, we present a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large.

Results: We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.

Keywords: Ancestral reconstruction; Double-cut-and-join; Genome rearrangement; Integer linear programming; Small parsimony.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
A genome of five markers 11, 12, 21, 31, 41 on a single linear chromosome
Fig. 2
Fig. 2
Capping-Free Multi-Relational Diagram for two genomes on an unresolved homology (1) with families {11,12,13,14}, {21,22},{31,32},{41},{51}.
Fig. 3
Fig. 3
Acrshort*cfmrd for the two genomes of Fig. 2 on a resolved homology (2) with families {11,13}, {12,14}, {21,22}, {31,32}, {41}, {51}. Note that (2) is a matching on (1)
Fig. 4
Fig. 4
Left: A degenerate genome. Right: A linearization of it
Fig. 5
Fig. 5
Left: This degenerate genome is not linearizable because of missing telomeres. Right: The genome becomes linearizable when adding telomeres. One linearization is that of Fig. 4
Algorithm 1
Algorithm 1
Capping-free Small Parsimony
Fig. 6
Fig. 6
Solving times for SPP-DCJ and SPP-DCJ-v2 on simulated genomes with increasing numbers of telomeres. Solid lines represent corresponding median values
Fig. 7
Fig. 7
Solving times for SPP-DCJ and SPP-DCJ-v2 on genomes generated by ZOMBI on a range of trees with increasing branch lengths with ancestral adjacencies inferred by DecoSTAR. Solid lines represent corresponding median values
Fig. 8
Fig. 8
Mean precision, recall and F1 score for default and safer linearization mode for varying tree scales. Transparent ranges indicate minimum to maximum range of the five tested samples per step
Fig. 9
Fig. 9
Average pre-processing and solving times of 50 samples for variants of SPP-DCJ-v2. NN - no additional pre-processing, IN - initial solution precomputed, IB - initial solution and lower bounds precomputed
Fig. 10
Fig. 10
Cladogram for seven Anopheles taxa
Fig. 11
Fig. 11
Gaps reported by gurobi with increasing solving times for SPP-DCJ and variants of SPP-DCJ-v2 until a time limit of 720 minutes. Right: Zoomed in on the first 25 minutes. NN - no additional pre-processing, IN - initial solution precomputed, IB - initial solution and lower bounds precomputed, IBF - initial solution and lower bounds precomputed, with variable ancestral family sizes

References

    1. Braga MDV, Willing E, Stoye J. Double cut and join with insertions and deletions. J Comput Biol. 2011;18(9):1167–84. 10.1089/cmb.2011.0118. - PubMed
    1. Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics. 2005;21(16):3340–6. 10.1093/bioinformatics/bti535. - PubMed
    1. Bohnenkämper L, Braga MDV, Doerr D, Stoye J. Computing the rearrangement distance of natural genomes. J Comput Biol. 2021;28(4):410–31. 10.1089/cmb.2020.0434. - PMC - PubMed
    1. Doerr D, Chauve C. Small parsimony for natural genomes in the DCJ-indel model. J Bioinform Comput Biol. 2021;19(06):2140009. 10.1142/S0219720021400096. - PubMed
    1. Rubert DP, Braga MDV. Efficient gene orthology inference via large-scale rearrangements. Algorithms Mol Biol. 2023;18:14. 10.1186/s13015-023-00238-y. - PMC - PubMed

LinkOut - more resources