Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;25(5):617-624.
doi: 10.1038/ejhg.2017.6. Epub 2017 Feb 8.

A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data

Affiliations

A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data

Douglas W Bjelland et al. Eur J Hum Genet. 2017 May.

Abstract

Identical by descent (IBD) segments are used to understand a number of fundamental issues in genetics. IBD segments are typically detected using long stretches of identical alleles between haplotypes in phased, whole-genome SNP data. Phase or SNP call errors in genomic data can degrade accuracy of IBD detection and lead to false-positive/negative calls and to under/overextension of true IBD segments. Furthermore, the number of comparisons increases quadratically with sample size, requiring high computational efficiency. We developed a new IBD segment detection program, FISHR (Find IBD Shared Haplotypes Rapidly), in an attempt to accurately detect IBD segments and to better estimate their endpoints using an algorithm that is fast enough to be deployed on very large whole-genome SNP data sets. We compared the performance of FISHR to three leading IBD segment detection programs: GERMLINE, refined IBD, and HaploScore. Using simulated and real genomic sequence data, we show that FISHR is slightly more accurate than all programs at detecting long (>3 cm) IBD segments but slightly less accurate than refined IBD at detecting short (~1 cm) IBD segments. More centrally, FISHR outperforms all programs in determining the true endpoints of IBD segments, which is crucial for several applications of IBD information. FISHR takes two to three times longer than GERMLINE to run, whereas both GERMLINE and FISHR were orders of magnitude faster than refined IBD and HaploScore. Overall, FISHR provides accurate IBD detection in unrelated individuals and is computationally efficient enough to be utilized on large SNP data sets >60 000 individuals.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
PPV-Sensitivity plots for FISHR (o), GERMLINE (), rIBD (), and HaploScore () when (a) calculated using a minimum of 3 cm for called IBD and a minimum of 3 cm for true IBD, (b) when using a minimum of 1 cm for called IBD and a minimum of 1 cm for true IBD, (c) when calculated using a minimum of 3 cm for called IBD and a minimum of 1.5 cm for true IBD for calculating PPV and using a minimum of 1.5 cM for called IBD and a minimum of 3 cm for true IBD for calculating sensitivity, and (d) when using a minimum of 1 cM for called IBD and a minimum of 0.5 cm for true IBD for calculating PPV and using a minimum of 0.5 for called IBD and a minimum of 1 cM for true IBD for calculating sensitivity. Additional measures are present for rIBD () when using a minimum true IBD length of 0.5 cM for PPV and no minimum called cm length for sensitivity (c) and a minimum true IBD length of 0.25 cm for PPV and no minimum called cM length for sensitivity (d).
Figure 2
Figure 2
Distributions of the proportion of under- and overextension for each called IBD segment >3 cm for FISHR, GERMLINE, rIBD, and HaploScore. Called segments were compared to true IBD segments with a minimum length of 1.5 cm. Called segments with no corresponding true IBD segments (the entire segment was overextended) were given values of 1, and true IBD segments with no corresponding called segments (the entire ‘called' segment was underextended) were given values of −1. Bias was defined as the mean proportion, precision as the standard deviation of the proportion, and accuracy as the standard deviation from 0 rather than from the mean proportion, with optimal values of precision and accuracy being closest to 0. Results listed to the left of the histograms included false-positive and false-negative calls. Results to the right of histograms (denoted by *) only included the called segments which had a corresponding true IBD segment.
Figure 3
Figure 3
An example of called IBD segments between two individuals in the UK10K data set, from (a) rIBD, (b) HaploScore, (c) GERMLINE, and (d) FISHR, with (e) opposite homozygous SNPs (OH) occurring for that pair of individuals in and surrounding the FISHR called IBD segment, and (f) OH occurring in a random pair of individuals at the same location of the called IBD segment. The horizontal offset seen in the rIBD segments represent multiple detected segments, with overlapping segments showing IBD 2.
Figure 4
Figure 4
Results of the analysis of proportion of opposite homozygosity (OH) in (a) four quartiles of called IBD segment and the two flanking regions and in (b) just the four quartiles of the called IBD segments for FISHR (o), GERMLINE (), rIBD (), HaploScore (), and random individuals at the same location of called IBD () where called IBD segments were a minimum of 3 cM. FISHR's pattern of results are closest to that expected from perfect estimation of IBD endpoints.

References

    1. Kong A, Masson G, Frigge ML et al: Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet 2008; 40: 1068–1075. - PMC - PubMed
    1. Setty MN, Gusev A, Pe'er I: HLA type inference via haplotypes identical by descent. J Comput Biol 2011; 18: 483–493. - PubMed
    1. Vacic V, Ozelius LJ, Clark LN et al: Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes. Hum Mol Genet 2014; 23: 4693–4702. - PMC - PubMed
    1. Browning SR, Browning BL: Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort. Hum Genet 2013; 132: 129–138. - PMC - PubMed
    1. Soi S, Scheinfeldt L, Lambert C et al Demographic histories of African hunting-gathering populations inferred from genome-wide SNP variation. International Congress of Human Genetics/American Society of Human Genetics meeting, Montreal, Canada 2011; (abstract 100).

Publication types

LinkOut - more resources