Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 7:2024.12.13.628449.
doi: 10.1101/2024.12.13.628449.

Fast simulation of identity-by-descent segments

Affiliations

Fast simulation of identity-by-descent segments

Seth D Temple et al. bioRxiv. .

Update in

Abstract

The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than ten thousand diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.

Keywords: 60–08; 92D15; 92–04; 92–08; 92–10; coalescent; computational runtime; identity-by-descent.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Conceptual framework for IBD segment lengths. (Left) Sample haplotypes a, b, c, d trace their lineages back to common ancestors at times t4, t4+t3, t4+t3+t2. (Right) Relative to a focal point, the haplotype segments lengths Ra, Rb, La, Lb are independent, identically distributed Exponential(t4). The lengths shared IBD are Ra,b:=min(Ra,Rb) and La,b:=min(La,Lb). The IBD segment length Wa,b:=La,b+Ra,bGamma(2,2·t4) exceeds the detection threshold w Morgans.
Fig. 2
Fig. 2
Compute time to simulate IBD segment lengths around a locus depending on algorithm implementation. Compute time (y-axis) in seconds by sample size (x-axis) in thousands is averaged over five simulations. The legend denotes colored line styles for implementations using Algorithm 1 as is (blue), merging only (orange), pruning only (green), and neither pruning nor merging (red). The main text describes “merging” and “pruning” techniques. The demography is the population bottleneck. The Morgans length threshold is 0.01.
Fig. 3
Fig. 3
Compute time to simulate IBD segment lengths around a locus depending on the detection threshold and population size. Compute time (y-axis) in seconds by sample size (x-axis) in thousands is averaged over five simulations. The legends denote colored line styles for A) different detection thresholds (in Morgans) with N=105 fixed or B) different population sizes with 0.02 Morgans fixed.
Fig. 4
Fig. 4
Percentage of regression model predictions explained by linear and quadratic effects. The percentage of predicted compute time (y-axis) in seconds by sample size (x-axis) in thousands with respect to linear and quadratic effects. Plots show results for A) the constant population size NNe=10, 000 versus B) NNe=100, 000. The detectable IBD segments are simulated with a 0.02 Morgans threshold.

References

    1. Adrion J.R., Cole C.B., Dukler N., Galloway J.G., Gladstein A.L., Gower G., Kyriazis C.C., Ragsdale A.P., Tsambos G., Baumdicker F., Carlson J., Cartwright R.A., Durvasula A., Gronau I., Kim B.Y., McKenzie P., Messer P.W., Noskova E., Ortega-Del Vecchyo D., Racimo F., Struck T.J., Gravel S., Gutenkunst R.N., Lohmueller K.E., Ralph P.L., Schrider D.R., Siepel A., Kelleher J., Kern A.D.: A community-maintained standard library of population genetic models. Elife 9 (2020) - PMC - PubMed
    1. Browning S.R., Browning B.L.: Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97(3), 404–418 (2015) - PMC - PubMed
    1. Browning S.R., Browning B.L.: Probabilistic estimation of identity by descent segment endpoints and detection of recent selection. Am. J. Hum. Genet. 107(5), 895–910 (2020) - PMC - PubMed
    1. Browning S.R., Browning B.L.: Biobank-scale inference of multi-individual identity by descent and gene conversion. Am. J. Hum. Genet. 111(4), 691–700 (2024) - PMC - PubMed
    1. Browning S.R., Browning B.L., Daviglus M.L., Durazo-Arvizu R.A., Schneiderman N., Kaplan R.C., Laurie C.C.: Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14(5) (2018) - PMC - PubMed

Publication types

LinkOut - more resources