Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb;25(2):280-9.
doi: 10.1101/gr.173641.114. Epub 2014 Oct 1.

Parente2: a fast and accurate method for detecting identity by descent

Affiliations

Parente2: a fast and accurate method for detecting identity by descent

Jesse M Rodriguez et al. Genome Res. 2015 Feb.

Abstract

Identity-by-descent (IBD) inference is the problem of establishing a genetic connection between two individuals through a genomic segment that is inherited by both individuals from a recent common ancestor. IBD inference is an important preceding step in a variety of population genomic studies, ranging from demographic studies to linking genomic variation with phenotype and disease. The problem of accurate IBD detection has become increasingly challenging with the availability of large collections of human genotypes and genomes: Given a cohort's size, a quadratic number of pairwise genome comparisons must be performed. Therefore, computation time and the false discovery rate can also scale quadratically. To enable accurate and efficient large-scale IBD detection, we present Parente2, a novel method for detecting IBD segments. Parente2 is based on an embedded log-likelihood ratio and uses a model that accounts for linkage disequilibrium by explicitly modeling haplotype frequencies. Parente2 operates directly on genotype data without the need to phase data prior to IBD inference. We evaluate Parente2's performance through extensive simulations using real data, and we show that it provides substantially higher accuracy compared to previous state-of-the-art methods while maintaining high computational efficiency.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of Parente2. A sliding block B of equal size to the minimum target size for IBD segment detection is examined, starting at every marker of the genome. In this figure, a block B of length 2 cM is displayed. The block contains windows, which are sets of winsize possibly nonconsecutive markers (by default, winsize = 8; in our benchmarks against other methods prior to optimizing this parameter, winsize = 5). A set of consecutive-marker windows tiles B; these are called the basic windows. In addition, c sets of nonconsecutive-marker windows tile B (by default, c = 10). Each such nonconsecutive window is generated by choosing winsize markers out of r markers (by default, r = 40; in this figure, r = 30). These are called the augmented windows. Windows are ordered lexicographically by their leftmost markers and grouped into window subsets by forming a subset out of each successive subsetsize window (by default, subsetsize = 5). Each window subset WSi is scored according to the outer log-likelihood ratio (Equation 6 in Methods) to yield ELRi; the score of B is the sum of these window subset scores according to Equation 7 in Methods.
Figure 2.
Figure 2.
Parente2’s sensitivity as a function of the number of training individuals. Parente2 was run on the WTCCC-2cM data set. The vertical axis shows sensitivity at a 1% false-positive rate.
Figure 3.
Figure 3.
Performance of Parente2 as a function of marker density. Parente2 performance is shown as a function of marker density; tests are performed on the HapMap-2cM benchmark with FPR fixed at 1%.
Figure 4.
Figure 4.
Effect of window size on Parente2’s performance. Increasing the window size of Parente2 results in better performance (test performed on the HapMap-2cM benchmark).
Figure 5.
Figure 5.
(A) Example of windows and window subsets. Here, windows contain three markers and window subsets contain two windows. (B,C) Graphical models used for the inner log-likelihood ratio described in Equation 2. (B) Model for two unrelated individuals that do not share an IBD segment in the window. (C) Model for two related individuals sharing a single IBD segment in the window. The variables formula image and formula image represent hidden haplotypes for a given window of markers. The variables g and g′ represent the observed genotype vectors from the first and second individual in a pair of individuals being evaluated for IBD in the window.

References

    1. Alkuraya FS. 2010. Homozygosity Mapping: one more tool in the clinical geneticist’s toolbox. Genet Med 12: 236–239. - PubMed
    1. Bercovici S, Meek C, Wexler Y, Geiger D. 2010. Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping. Bioinformatics 26: i175–i182. - PMC - PubMed
    1. Bourgain C, Hoffjan S, Nicolae R, Newman R, Steiner L, Walker K, Reynolds R, Ober C, McPeek MS. 2003. Novel case-control test in a founder population identifies P-Selectin as an atopy-susceptibility locus. Am J Hum Genet 73: 612–626. - PMC - PubMed
    1. Browning SR. 2006. Multilocus association mapping using variable-length Markov chains. Am J Hum Genet 78: 903–913. - PMC - PubMed
    1. Browning SR, Browning BL. 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084–1097. - PMC - PubMed

Publication types

LinkOut - more resources