Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 23;87(7):84.
doi: 10.1007/s11538-025-01464-8.

Fast simulation of identity-by-descent segments

Affiliations

Fast simulation of identity-by-descent segments

Seth D Temple et al. Bull Math Biol. .

Abstract

The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than 10,000 diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.

Keywords: Coalescent; Computational runtime; Identity-by-descent.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participate: Not applicable Consent for publication: Not applicable

Figures

Fig. 1
Fig. 1
Conceptual framework for IBD segment lengths. (Left) Sample haplotypes abcd trace their lineages back to common ancestors at times t4,t4+t3,t4+t3+t2. (Right) Relative to a focal point, the haplotype segments lengths Ra,Rb,La,Lb are independent, identically distributed Exponential(t4). The lengths shared IBD are Ra,b:=min(Ra,Rb) and La,b:=min(La,Lb). The IBD segment length Wa,b:=La,b+Ra,bGamma(2,2·t4) exceeds the detection threshold w Morgans (Color figure online)
Algorithm 1
Algorithm 1
Efficient simulation of IBD segment lengths
Fig. 2
Fig. 2
Compute time to simulate IBD segment lengths around a locus depending on algorithm implementation. Compute time (y-axis) in seconds by sample size (x-axis) in thousands is averaged over five simulations. The legend denotes colored line styles for implementations using Algorithm 1 as is (blue), merging only (orange), pruning only (green), and neither pruning nor merging (red). The main text describes “merging” and “pruning” techniques. The demography is the population bottleneck. The Morgans length threshold is 0.01 (Color figure online)
Fig. 3
Fig. 3
Compute time to simulate IBD segment lengths around a locus depending on the detection threshold and population size. Compute time (y-axis) in seconds by sample size (x-axis) in thousands is averaged over five simulations. The legends denote colored line styles for A) different detection thresholds (in Morgans) with N=105 fixed or B) different population sizes with 0.02 Morgans fixed (Color figure online)
Fig. 4
Fig. 4
Percentage of regression model predictions explained by linear and quadratic effects. The percentage of predicted compute time (y-axis) in seconds by sample size (x-axis) in thousands with respect to linear and quadratic effects. Plots show results for A) the constant population size NNe=10,000 versus B) NNe=100,000. The detectable IBD segments are simulated with a 0.02 Morgans threshold (Color figure online)

Update of

References

    1. Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Gronau I, Kim BY, McKenzie P, Messer PW, Noskova E, Ortega-Del Vecchyo D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmueller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, Kern AD (2020) A community-maintained standard library of population genetic models. Elife 9 - PMC - PubMed
    1. Browning SR, Browning BL (2015) Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97(3):404–418 - PMC - PubMed
    1. Browning SR, Browning BL (2020) Probabilistic estimation of identity by descent segment endpoints and detection of recent selection. Am. J. Hum. Genet. 107(5):895–910 - PMC - PubMed
    1. Browning SR, Browning BL (2024) Biobank-scale inference of multi-individual identity by descent and gene conversion. Am. J. Hum. Genet. 111(4):691–700 - PMC - PubMed
    1. Browning SR, Browning BL (2025) Estimating gene conversion rates from population data using multi-individual identity by descent. bioRxiv 10.1101/2025.02.22.639693

LinkOut - more resources