Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 7:2024.06.05.597656.
doi: 10.1101/2024.06.05.597656.

Identity-by-descent segments in large samples

Affiliations

Identity-by-descent segments in large samples

Seth D Temple et al. bioRxiv. .

Update in

Abstract

If two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to be derived identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved ancestral tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. We show that the proportion of detectable identity-by-descent segments around a locus is normally distributed when the sample size and the scaled population size are large. We generalize this central limit theorem to cover flexible demographic scenarios, multi-way identity-by-descent segments, and multivariate identity-by-descent rates. We use efficient simulations to study the distributional behavior of the detectable identity-by-descent rate. One consequence of non-normality in finite samples is that a genome-wide scan looking for excess identity-by-descent rates may be subject to anti-conservative control of family-wise error rates.

Keywords: asymptotic normality; coalescent; covariance; identity-by-descent.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:
Example calculation of the detectable IBD rate. IBD segment lengths overlapping a focal point for sample haplotypes a,b,c,d are shown. The IBD segment indicators (Yi,j’s) are 1 if their IBD segment lengths (Wi,j’s) exceed w Morgans and otherwise 0. The detectable IBD rate Y¯ is the mean of these correlated binary random variables. The detectable IBD rate to the right of the focal point, X¯, is calculated similarly.
Figure 2:
Figure 2:
Shapiro-Wilk tests for varying population sizes. Line plots show the proportions of Shapiro-Wilk tests rejected at the significance level 0.05 (y-axis) for varying population size and fixed sample size. Each proportion is computed over five hundred tests. Each test is based on one thousand simulations of the number of identity-by-descent lengths longer than a specified Morgans length threshold (x-axis). A) The sample size is five thousand individuals. B) The sample size is ten thousand. The legends assign colors to different population sizes. The horizontal dotted line is at 0.05.
Figure 3:
Figure 3:
Relative upper bound for excess IBD scan. Line plots show the average mean plus four standard deviations divided by the 99.99683 percentile over two million simulations (y-axis). (The standard normal survival function of four is 0.9999683.) Each average relative upper bound is computed over one thousand tests. Each test is based on two thousand simulations of the number of identity-by-descent lengths longer than a specified Morgans length threshold (x-axis). A) The sample size is five thousand diploid individuals. B) The sample size is ten thousand diploid individuals. The legends assign colors to different constant population sizes.
Figure 4:
Figure 4:
Comparing features between IBD and Erdős-Rényi graphs. Histograms compare the density of graph features between IBD and Erdős-Rényi graphs. Each histogram summarizes the results of one hundred and twenty-five thousand simulations. A) and C) show the number of trees of order 2 and 3, respectively. B) shows the number of complete components with more than three nodes. D) shows the total number of edges. The legends give color assigned to the IBD and Erdős-Rényi graphs. IBD graphs are simulated using the constant one hundred thousand diploid individuals’ demography and the 0.03 Morgans length threshold. Erdős-Rényi graphs are simulated using the same success probability as in the IBD graph. The sample size is two thousand diploid individuals. Vertical lines show the means.

Similar articles

References

    1. Albrechtsen A., Moltke I., and Nielsen R.. Natural selection and the distribution of identity-by-descent in the human genome. Genetics, 186:295–308, 2010. - PMC - PubMed
    1. Browning B. L., Tian X., Zhou Y., and Browning S. R.. Fast two-stage phasing of large-scale sequence data. Am. J. Hum. Genet., 108(10):1880–1890, 2021. - PMC - PubMed
    1. Browning S.. A Monte Carlo approach to calculating probabilities for continuous identity by descent data. J. Appl. Prob., 37(3):850–864, 2000.
    1. Browning S. R. and Browning B. L.. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet., 97(3):404–418, 2015. - PMC - PubMed
    1. Browning S. R. and Browning B. L.. Probabilistic estimation of identity by descent segment endpoints and detection of recent selection. Am. J. Hum. Genet., 107(5):895–910, 2020. - PMC - PubMed

Publication types

LinkOut - more resources