Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 29;18(1):15.
doi: 10.1186/s13015-023-00241-3.

Constructing founder sets under allelic and non-allelic homologous recombination

Affiliations

Constructing founder sets under allelic and non-allelic homologous recombination

Konstantinn Bonnet et al. Algorithms Mol Biol. .

Erratum in

Abstract

Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements-including deletion, duplication, and inversion-and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR. In this work, we propose such a computational model, providing a unified framework for both (allelic) homologous recombination and NAHR. Our model represents a set of genomes as a graph, where haplotypes correspond to walks through this graph. We formulate two founder set problems under our recombination model, provide flow-based algorithms for their solution, describe exact methods to characterize the number of recombinations, and demonstrate scalability to problem instances arising in practice.

Keywords: Founder set reconstruction; Homologous recombination; NAHR; Pangenomics; Variation graph.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Illustration of an NAHR-mediated inversion. Haplotype A (black line) represents the original configuration, while haplotype B (red line) can be derived from A by two recombination events between inverted repeats of genomic marker 3 as indicated by the red stars
Fig. 2
Fig. 2
Illustration of variation graph from Example 2
Fig. 3
Fig. 3
Network flow solution on variation graph GH of Example 2
Fig. 4
Fig. 4
Component graph of components C1, C2, P1, and P2 (left) and a founder set of H (right) from Example 2
Fig. 5
Fig. 5
Components of flow solution on variation graph GH of Example 2
Fig. 6
Fig. 6
Illustration of the contradictory claim a shorter sequence of T-blocks can be constructed than found by Eq. 1. The red dashed line indicates the contradictory situation that ik<jk. In that case Q[ik-1..jk] would have been chosen as longest T-block in recursion step k-1
Fig. 7
Fig. 7
Flow graph GH,μ^F of Example 2. In nodes and out nodes are highlighted in red and blue, respectively. For clarity, the direction of marker edges (gray edges; directed from in to out node) is omitted in the illustration
Fig. 8
Fig. 8
Solution to Algorithm 2 for GH,μ^F for Example 2
Fig. 9
Fig. 9
Mean number of recombinations by the size of the graph. Experiments were ran with values ranging from 10 to 200 in for the number of markers, in increments of 10. The ratio of duplications and of inversions was fixed to 10%, and number of haplotypes to 10. Each colored dot represents the mean number of recombinations over 50 replicates for one parameter set, after random assignment trials (blue) and after optimization (red)
Fig. 10
Fig. 10
Problem 1, flow computational performance benchmarks. Runtime in seconds (upper panels) and peak PSS in MB (lower panels), as a function of the number of markers (left) and of the ratio of duplications (right). For each experiment, the remaining parameters are fixed as indicated above. The abbreviations read as follows: Nm number of markers; Rd ratio of duplications; and Ri, ratio of inverted duplications
Fig. 11
Fig. 11
Problem 3, recombinations minimization performance benchmarks. Plots analogous to Fig. 10. Runtime in seconds (upper panels) and peak PSS in MB (lower panels), as a function of the number of markers (left) and of the ratio of duplications (right). For each experiment, the remaining parameters are fixed as indicated above. The abbreviations read as follows: Nh number of haplotypes; Nm, number of markers; Rd ratio of duplications; and Ri, ratio of inverted duplications
Fig. 12
Fig. 12
Graphical representation of the variation graph for the 1p36.13 locus data. On the left, a 2D plot rendered by Bandage [43]. Markers are represented as numbered colored rectangles, and the undirected edges connecting them as black curves. Markers 1 and 8 correspond respectively to the source and the sink of the graph. The right plot shows the walk through the graph from source (blue) to sink (red) corresponding to the sequence of haplotype AFR-NA19036-h1, a sample of African origin from our experimental data. The sample’s sequence in the previously established notation is: 1234¯565432¯732432¯78

References

    1. Ukkonen E. Finding Founder Sequences from a Set of Recombinants. In: 2nd International Workshop on algorithms in bioinformatics (WABI 2002). Algorithms in bioinformatics. Berlin, Heidelberg: Springer Berlin Heidelberg; 2002:277-86.
    1. Norri T, Cazaux B, Dönges S, Valenzuela D, Mäkinen V. Founder reconstruction enables scalable and seamless pangenomic analysis. Bioinformatics. 2021;37(24):4611–9. doi: 10.1093/bioinformatics/btab516. - DOI - PMC - PubMed
    1. Parida L, Melé M, Calafell F, Bertranpetit J, Consortium G. Estimating the ancestral recombinations graph (ARG) as compatible networks of SNP patterns. J Comput Biol. 2008;15(9):1133–53. doi: 10.1089/cmb.2008.0065. - DOI - PubMed
    1. Swenson KM, Guertin P, Deschênes H, Bergeron A. Reconstructing the modular recombination history of Staphylococcus aureus phages. BMC Bioinform. 2013;14(15):1–9. - PMC - PubMed
    1. Rastas P, Ukkonen E. Haplotype Inference Via Hierarchical Genotype Parsing. In: Giancarlo R, Hannenhalli S, editors. 7th International Workshop on Algorithms in Bioinformatics (WABI 2007). Algorithms in Bioinformatics. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007:85-97.

LinkOut - more resources