Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;26(12):i191-8.
doi: 10.1093/bioinformatics/btq222.

Efficient identification of identical-by-descent status in pedigrees with many untyped individuals

Affiliations

Efficient identification of identical-by-descent status in pedigrees with many untyped individuals

Xin Li et al. Bioinformatics. .

Abstract

Motivation: Inference of identical-by-descent (IBD) probabilities is the key in family-based linkage analysis. Using high-density single nucleotide polymorphism (SNP) markers, one can almost always infer haplotype configurations of each member in a family given all individuals being typed. Consequently, the IBD status can be obtained directly from haplotype configurations. However, in reality, many family members are not typed due to practical reasons. The problem of IBD/haplotype inference is much harder when treating untyped individuals as missing.

Results: We present a novel hidden Markov model (HMM) approach to infer the IBD status in a pedigree with many untyped members using high-density SNP markers. We introduce the concept of inheritance-generating function, defined for any pair of alleles in a descent graph based on a pedigree structure. We derive a recursive formula for efficient calculation of the inheritance-generating function. By aggregating all possible inheritance patterns via an explicit representation of the number and lengths of all possible paths between two alleles, the inheritance-generating function provides a convenient way to theoretically derive the transition probabilities of the HMM. We further extend the basic HMM to incorporate population linkage disequilibrium (LD). Pedigree-wise IBD sharing can be constructed based on pair-wise IBD relationships. Compared with traditional approaches for linkage analysis, our new model can efficiently infer IBD status without enumerating all possible genotypes and transmission patterns of untyped members in a family. Our approach can be reliably applied on large pedigrees with many untyped members, and the inferred IBD status can be used for non-parametric genome-wide linkage analysis.

Availability: The algorithm is implemented in Matlab and is freely available upon request.

Supplementary information: Supplementary data are available on Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) a pedigree structure, drawn in a conventional way. (B) one of its many possible descent graphs. Each individual has two nodes, representing paternal and maternal alleles. An edge in the descent graph indicates which of the two alleles in a parent is transmitted to a child.
Fig. 2.
Fig. 2.
The basic two-state HMM model labeled by transition probabilities.
Fig. 3.
Fig. 3.
IBD sharing states between two individuals. Righthand side are the states not considered in this study.
Fig. 4.
Fig. 4.
The three-state transition model of the IBD status between two alleles a and b.
Fig. 5.
Fig. 5.
The complete transition model of IBD sharing states between two individuals. Solid lines indicate actual IBD and dashed lines indicate Bg-IBD.
Fig. 6.
Fig. 6.
IBD sharing between two siblings. The dotted bar indicates the density of markers of IBS number 0, 1 and 2. The bold line is the pedigree IBD sharing number and the thin line is the Bg-IBD sharing number.
Fig. 7.
Fig. 7.
Length distributions of IBD and Bg-IBD intervals. The chart puts both distributions together with x-axis on a logarithmic scale. The left-hand curve is from Bg-IBD intervals and the right-hand curve is from IBD intervals.
Fig. 8.
Fig. 8.
Families 1, 2 and 3. Gray-colored individuals are not genotyped. Black-colored individuals are diseased and white-colored individuals are normal.
Fig. 9.
Fig. 9.
IBD sharing between members 9 and 10 of Family 1. The layout of the figure is the same as the layout of Figure 6.
Fig. 10.
Fig. 10.
Global IBD sharing graphs for different chromosomal regions. Alleles connected by an arc or arrow are IBD.
Fig. 11.
Fig. 11.
Comparison of recombination positions inferred by the proposed method and by the Mendelian law. Numbers are shown in the unit of megabase pair. Shaded areas are the regions where the parents are homozygous.
Fig. 12.
Fig. 12.
Locus-by-locus IBD inference error for different relatives in Families 1, 2 and 3.
Fig. 13.
Fig. 13.
Comparison of IBD inference accuracy of Ped-IBD and MERLIN for persons of different kinships in Family 1.

Similar articles

Cited by

References

    1. Abecasis GR, Wigginton JE. Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am. J. Hum. Genet. 2005;77:754–767. - PMC - PubMed
    1. Abecasis GR, et al. Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 2002;30:97–101. - PubMed
    1. Elston RC, Stewart J. A general model for the genetic analysis of pedigree data. Hum. Hered. 1971;21:523–542. - PubMed
    1. Geiger D, et al. Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space. Bioinformatics. 2009;25:i196–i203. - PMC - PubMed
    1. Gudbjartsson DF, et al. Allegro version 2. Nat. Genet. 2005;37:1015–1016. - PubMed

Publication types