Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 4;111(4):691-700.
doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

Biobank-scale inference of multi-individual identity by descent and gene conversion

Affiliations

Biobank-scale inference of multi-individual identity by descent and gene conversion

Sharon R Browning et al. Am J Hum Genet. .

Abstract

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Transitivity of IBD (A) The coalescent tree relationship at a given point in the genome is shown for haplotypes h1,h2, and h3, which are mutually IBD at this location. Haplotypes h1 and h2 have common ancestor X, while haplotypes h1 and h3, and haplotypes h2 and h3, have common ancestor Y. (B) Pairwise IBD status (black for IBD, white for non-IBD) is shown for the three pairs of haplotypes along a region of the chromosome around the focal position (denoted ). The IBD extends to either side of the focal point until reaching a point of recombination on one of the ancestral lineages. Although the IBD sharing between haplotypes h1 and h2, and between haplotypes h2 and h3, is long and may exceed a pre-defined length threshold, the IBD between haplotypes h1 and h3 is relatively short and may not meet the length threshold for pairwise IBD sharing.
Figure 2
Figure 2
IBD transitivity with and without trimming IBS segments IBD and IBS in a genomic region is shown for the three pairings of three haplotypes (haplotypes h1,h2, and h3). (A) The IBD between haplotypes h1 and h2 is derived from a different recent common ancestor than that of the IBD between haplotypes h2 and h3. IBS that is not due to the recent common ancestors is incorrectly called as IBD at the ends of the IBD segments. As a result, transitivity leads to a region of IBS being incorrectly called as IBD between haplotypes h1 and h3. (B) A trim is applied to the ends of the pairwise IBS regions, and no IBD is called between haplotypes h1 and h3.
Figure 3
Figure 3
Observed length of inferred gene conversion tracts Observed length of inferred gene conversion tracts in (A) simulated data (125,000 individuals with 20 regions of length 10 Mb) and (B) UK Biobank White British. The observed length of an inferred tract is the distance between the first changed allele and the final changed allele (inclusive), which will generally be significantly shorter than the actual length of the underlying gene conversion tract. Only lengths >1 bp are shown: 80.6% of observed lengths were 1 bp in the large simulate data, and 82.9% of observed lengths were 1 bp in the UK Biobank White British data.
Figure 4
Figure 4
IBD cluster sizes in the UK Biobank White British autosomal sequence data Cluster size is shown on the x axis for cluster sizes of 3 in the left panel and 3 in the right panel. The y axis shows the proportion of haplotypes that are in IBD clusters having that size.

Update of

References

    1. Gusev A., Lowe J.K., Stoffel M., Daly M.J., Altshuler D., Breslow J.L., Friedman J.M., Pe'er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–326. - PMC - PubMed
    1. Browning S.R., Browning B.L. Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 2012;46:617–633. - PubMed
    1. Sticca E.L., Belbin G.M., Gignoux C.R. Current developments in detection of identity-by-descent methods and applications. Front. Genet. 2021;12 - PMC - PubMed
    1. Te Meerman G.J., Van Der Meulen M.A., Sandkuijl L.A. Perspectives of identity by descent (IBD) mapping in founder populations. Clin. Exp. Allergy. 1995;25:97–102. - PubMed
    1. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. - PMC - PubMed

LinkOut - more resources