Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 2;106(4):453-466.
doi: 10.1016/j.ajhg.2020.02.012. Epub 2020 Mar 19.

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification

Affiliations

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification

Daniel N Seidman et al. Am J Hum Genet. .

Abstract

Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.

Keywords: IBD; identical by descent segments; identity by descent; relatedness inference; unphased genotypes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Example Genotype Data and IBIS Processing (A) Genotype data at three markers for four samples (stick figures). The input encodes the alleles at each site as a zero or one, as indicated. (B) Sets Hs,1(0) and Hs,1(1) for each sample s contain the markers that are homozygous for alleles 0 and 1, respectively. (C) The set operations to construct W1 for two of the samples yields one marker where these individuals are homozygous for opposite alleles. (D) Example Wi values across five windows for a given pair. IBIS detects potential IBD segments as a series of windows where the first and last windows have no inconsistencies and where |Wi|1 for all other windows i. Such a region with no inconsistent genotypes spans windows two through four in the figure.
Figure 2
Figure 2
Rates of Classifying Simulated Third (left) and Fifth (right) Degree Relatives to Various Degrees of Relatedness Third degree relatives shown on left; fifth degree relatives shown on right. Includes rates from IBIS (with IBD2 detection), KING, the true IBD segments generated by the simulator, and TRUFFLE. The center set of bars in each plot represent correct inference, and the set of bars to the left and right represent inferring one degree closer and one degree more distant than the truth, respectively.
Figure 3
Figure 3
PPV and Sensitivity of IBD Segments for the Tested Methods, Subdivided across Several Bins of Segment Lengths We calculate PPV using segments that have an inferred length within a given bin and sensitivity using segments that have a true length within a given bin (Material and Methods).
Figure 4
Figure 4
Distances between the True and Inferred Segment Start and End Positions and the Sizes and Numbers of Gaps between Inferred Segments that Span a Contiguous True IBD Interval
Figure 5
Figure 5
Runtimes of IBD Segment Detection Algorithms on Subsets of 4,500 Simulated Individuals Left plot includes the runtime for Beagle for methods that require phased data, and the right omits this time.
Figure 6
Figure 6
Rates of Classifying the Real SAMAFS Relatives to Their Reported Degree of Relatedness, Subdivided by This Degree Includes output from IBIS with and without IBD2 detection, combined segments from three independent runs of Refined IBD (Refined IBD×3), and other methods, as indicated.

References

    1. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. - PMC - PubMed
    1. Staples J., Maxwell E.K., Gosalia N., Gonzaga-Jauregui C., Snyder C., Hawes A., Penn J., Ulloa R., Bai X., Lopez A.E. Profiling and leveraging relatedness in a precision medicine cohort of 92,455 exomes. Am. J. Hum. Genet. 2018;102:874–889. - PMC - PubMed
    1. Erlich Y., Shor T., Pe’er I., Carmi S. Identity inference of genomic data using long-range familial searches. Science. 2018;362:690–694. - PMC - PubMed
    1. Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. - PMC - PubMed
    1. Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. - PMC - PubMed

Publication types