Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 5;107(5):895-910.
doi: 10.1016/j.ajhg.2020.09.010. Epub 2020 Oct 13.

Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection

Affiliations

Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection

Sharon R Browning et al. Am J Hum Genet. .

Abstract

Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.

Keywords: identity by descent; natural selection; recent positive selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Uncertainty in IBD Endpoints (A) Allele discordances between two haplotypes are represented as crosses. We wish to estimate the endpoints of the IBD segment that covers the focal position in the middle of the longest identity by state (IBS) interval. (B–D) Three of the possibilities for the shared IBD segment that covers the focal position. (B) The IBD segment contains the first discordance to the right of the focal position. (C) The IBD segment does not extend all the way to the discordances and has short flanking segments of IBS. (D) Two moderately long IBD segments are adjacent. In this case, the second IBD segment is not of direct interest because it does not cover the focal position.
Figure 2
Figure 2
Method Performance with Uneven Marker Density Sequence data on 2,000 individuals were simulated under a constant effective population size. Markers located between 20 and 23 Mb were removed, and marker density varies every 100 kb (see Material and Methods). The true haplotype phase is used in the analysis. (A) Quantile-quantile plot assessing the calibration of the estimated endpoint uncertainty. The actual quantile (y axis) corresponding to a given nominal quantile (x axis) is the proportion of segments for which the reported nominal quantile of the right endpoint is greater than the true right endpoint (points on the plot). The y=x line is shown for comparison. Results for the left endpoints are similar but are not shown. (B) The y axis is the IBD rate, which is the percentage of pairs of haplotypes for which the position on the chromosome is covered by an estimated IBD segment with length >2 cM for the haplotype pair. Estimated IBD segment endpoints are the posterior medians. The IBD rate is calculated at 10 kb intervals.
Figure 3
Figure 3
Method Performance on UK-like Simulated Sequence and SNP Array Data The data comprise 50,000 individuals simulated from a UK-like demographic history (see Material and Methods), with a genotype error rate of 0.02%. True IBD segment endpoints were determined for 1,000 individuals, and these individuals were used to generate the results in this figure. The top row shows quantile-quantile plots that assess the calibration of the estimated endpoint uncertainty. The y=x line is shown for comparison. The actual quantile (y axis) corresponding to a given nominal quantile (x axis) is the proportion of segments for which the reported nominal quantile of the right endpoint is greater than the true right endpoint. The bottom row shows histograms of the right endpoint sampled from the estimated posterior distribution minus the posterior median right endpoint. The histograms represent the distribution of uncertainty, averaged over segments. Histogram bin widths are 5 kb. Results for the left endpoints are similar but are not shown. The left column is for analysis using the true haplotype phase. The middle column is for analysis using haplotype phase inferred using Beagle 5.1. The right column is for data thinned to match a SNP array with 500,000 markers genome-wide (10,000 markers in the simulated 60 Mb interval), and with haplotype phase inferred using Beagle 5.1.
Figure 4
Figure 4
Rate of IBD Segments along the Autosomes in UK Biobank White British Data The x axis shows position along each chromosome. Chromosomes alternate in color. Notable genes and regions (LCT, MHC, OAS, and TRPM1) located within the four highest peak regions are labeled. The y axis is the IBD rate, which is the percentage of pairs of haplotypes for which the position on the chromosome is covered by an IBD segment with length >2 cM for the haplotype pair. IBD segment endpoints are posterior medians. The IBD rate is calculated at 10 kb intervals. The black dashed lines show the thresholds of 0.025% and 0.021% used for the results in Tables 1 and 2, respectively.

References

    1. Browning B.L., Browning S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 2011;88:173–182. - PMC - PubMed
    1. Huff C.D., Witherspoon D.J., Simonson T.S., Xing J., Watkins W.S., Zhang Y., Tuohy T.M., Neklason D.W., Burt R.W., Guthery S.L. Maximum-likelihood estimation of recent shared ancestry (ERSA) Genome Res. 2011;21:768–774. - PMC - PubMed
    1. Ramstetter M.D., Dyer T.D., Lehman D.M., Curran J.E., Duggirala R., Blangero J., Mezey J.G., Williams A.L. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives. Genetics. 2017;207:75–82. - PMC - PubMed
    1. Palamara P.F., Lencz T., Darvasi A., Pe’er I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 2012;91:809–822. - PMC - PubMed
    1. Palamara P.F., Pe’er I. Inference of historical migration rates via haplotype sharing. Bioinformatics. 2013;29:i180–i188. - PMC - PubMed

Publication types

LinkOut - more resources