Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun;194(2):459-71.
doi: 10.1534/genetics.113.150029. Epub 2013 Mar 27.

Improving the accuracy and efficiency of identity-by-descent detection in population data

Affiliations

Improving the accuracy and efficiency of identity-by-descent detection in population data

Brian L Browning et al. Genetics. 2013 Jun.

Abstract

Segments of indentity-by-descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping, and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement) evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent superexponential population growth that is designed to match United Kingdom data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the United Kingdom and from Northern Finland and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, and easy to use and is implemented in Beagle version 4.

Keywords: Beagle; identity-by-descent (IBD) segments; shared haplotypes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the Refined IBD algorithm.
Figure 2
Figure 2
Identity-by-descent detection accuracy. (A–C) Sample size of 500 individuals; (D–F) sample size of 2000 individuals. A and D show true vs. false discovery. False discovery (x-axis) is measured by the average proportion of the genome that, for a pair of individuals, is in detected IBD segments that are determined to be false. Here falsely detected IBD segments are segments for which at most 25% of the detected segment is true IBD as determined from the simulated phase-known sequence data. True discovery (y-axis) is measured by the average proportion of the region that, for a pair of individuals, is in detected IBD that is also true IBD. Any part of a detected IBD segment that is not part of a true IBD segment is not included in this measure. B and E show power to detect IBD as a function of the underlying size of the true IBD segment. The average proportion of the segment that is detected is shown on the y-axis. Undetected segments (proportion 0) are included in this measure. C and F measure the accuracy of detected segments of a given reported size. The y-axis gives the probability that a reported segment is true, which is defined here as the probability that at least 50% of the segment is true IBD.
Figure 3
Figure 3
Under- and overestimation of IBD segment lengths. (A and B) Sample size of 500 individuals; (C and D) sample size of 2000 individuals. A and C show the average amount of IBD segment missed, for segments of a given size, conditional on at least part of the segment being found. The missed amount includes gaps in the middle of a segment and underestimation of endpoints of a segment. B and D show the average amount of overestimation of a segment, for segments of a given size, conditional on at least part of the segment being found. Overestimation of ends includes the bridging of two segments: in such a case the true IBD in one segment contributes to the end overestimation of the other segment.
Figure 4
Figure 4
Identity-by-descent detection accuracy, including the effects of overestimation. Whereas in Figure 1 overestimation is not factored into accuracy metrics, here the false discovery rate is the proportion of the total detected IBD that does not cover a true underlying IBD segment as determined from the underlying phased sequence data. Thus, here the false discovery rate on the x-axis includes both falsely detected segments and overestimation of the endpoints of true detected segments. The detection rate on the y-axis is the average length of true IBD found per pair of individuals, divided by the length of the region.
Figure 5
Figure 5
Lengths of detected IBD segments. (A) In the simulated SNP data, with a sample size of 2000. (B) In the Wellcome Trust Case Control Consortium 2 United Kingdom data. (C) In the Northern Finland Birth Cohort data. A LOD score threshold of 3 was used in all three cases.
Figure 6
Figure 6
Identity-by-descent detection accuracy in sequence data. Simulated sequence data on 500 individuals were used. Results from SNP data, reproduced from Figure 2, are shown for comparison. See Figure 2 for description of the axis labels.
Figure 7
Figure 7
Histogram of sum of lengths of detected IBD shared by pairs of individuals. (A) In the Wellcome Trust Case Control Consortium 2 United Kingdom data. (B) In the Northern Finland Birth Cohort data. A LOD score threshold of 3 was used in both cases.
Figure 8
Figure 8
Patterns of IBD sharing between three individuals. Individuals are shown as ovals, while their haplotypes are shown as circles. IBD at a haplotype level is shown by dashed lines connecting the IBD haplotypes and by the use of the same color for IBD haplotypes. In all cases, there is IBD between all three pairs of individuals. (A) Each pair of individuals shares a different haplotype. (B) The three individuals share a single haplotype. (C) As in B, but the third individual is homozygous by descent. These three scenarios cannot be distinguished without further data when IBD is reported only at the individual level, but are clearly different with IBD at the haplotype level.

References

    1. 1000 Genomes Consortium , 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. - PMC - PubMed
    1. Albrechtsen A., Korneliussen T. S., Moltle I., Hansen T. V., Nielsen F. C., et al. , 2009. Relatedness mapping and tracts of relatedness for genome-wide data in the presence of linkage disequilibrium. Genet. Epidemiol. 33: 266–274. - PubMed
    1. Barrett J. C., Lee J. C., Lees C. W., Prescott N. J., Anderson C. A., et al. , 2009. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat. Genet. 41: 1330–1334. - PMC - PubMed
    1. Baum, L. E., 1972 An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, pp. 1–8 in Inequalities III: Proceedings of the Third Symposium on Inequalities held at the University of California, Los Angeles, September 1–9, 1969, edited by O. Shisha. Academic Press, San Diego.
    1. Brown M. D., Glazner C. G., Zheng C., Thompson E. A., 2012. Inferring coancestry in population samples in the presence of linkage disequilibrium. Genetics 190: 1447–1460. - PMC - PubMed

Publication types