Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep;40(9):1068-75.
doi: 10.1038/ng.216.

Detection of sharing by descent, long-range phasing and haplotype imputation

Affiliations

Detection of sharing by descent, long-range phasing and haplotype imputation

Augustine Kong et al. Nat Genet. 2008 Sep.

Abstract

Uncertainty about the phase of strings of SNPs creates complications in genetic analysis, although methods have been developed for phasing population-based samples. However, these methods can only phase a small number of SNPs effectively and become unreliable when applied to SNPs spanning many linkage disequilibrium (LD) blocks. Here we show how to phase more than 1,000 SNPs simultaneously for a large fraction of the 35,528 Icelanders genotyped by Illumina chips. Moreover, haplotypes that are identical by descent (IBD) between close and distant relatives, for example, those separated by ten meioses or more, can often be reliably detected. This method is particularly powerful in studies of the inheritance of recurrent mutations and fine-scale recombinations in large sample sets. A further extension of the method allows us to impute long haplotypes for individuals who are not genotyped.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The concept of surrogate parenthood. Typed relatives who share either the paternal or maternal haplotypes of the proband can be used to phase the proband as though they are actual parents. These relatives are referred to as surrogate fathers and surrogate mothers respectively. A surrogate father does not have to be a male and a surrogate mother does not have to be a female. Surrogate parenthood is locus specific. A sibling can be a surrogate father for one locus and a surrogate mother for another locus. However, for a locus where the sibling shares both haplotypes with the proband, the sibling is like a twin and cannot be used to phase the proband.
Fig. 2
Fig. 2
As a function of sample size, in absolute number and as a fraction of the size of the living population in Iceland (316,000), (a) the fraction of typed individuals with at least one surrogate parent and (b) the fraction of individuals in the largest connected cluster in the haplotype sharing graph. A person with one or more surrogate parents can at least be partially phased. Individuals in the main cluster have a large number of surrogate relatives, and often every SNP can be phased.
Fig. 3
Fig. 3
Applying long-range phasing to determine a recombination event. The results from phasing a 10Mb region including the MHC were used, although only the 10 SNPs around the recombination event are highlighted. By phasing M using relatives R1 to R4, the recombination event in C3 could be deduced based on data from the trio F, M and C3 only, without the need of data from C1 and C2, or data from the parents of M. Having R2 and R4 could actually be better than having the two parents of M. A SNP informative for recombination in the children has to be heterozygous in M. Here both SNP5 and SNP6 are. To phase M, one of her parents (if typed) or surrogate relatives needs to be homozygous. In this case, R2 and R4 are each homozygous for both SNP5 and SNP6, so having one of them would be sufficient to deduce the precise location of the recombination. By contrast, R1 is homozygous at SNP6, but heterozygous at SNP5. With R1 only, we could deduce that a recombination in C3 occurred between SNP3 (the closest marker on the left that is heterozygous for M and homozygous for R1) and SNP6, but some resolution would be lost. The same could happen if one or both parents of M were typed. Surrogate relatives who are not surrogate parents of M can also help. E.g. the uncertain phase of SNP 5 in R1 can be resolved by surrogate parents of his sharing the other haplotype. Surrogate parents of R1 are surrogate relatives of M with Erdös distance 2.
Fig. 4
Fig. 4
The inheritance of a chromosome associated with a deletion. Typed are P1-P2 and R1-R3. Long-range phasing revealed that they all share a haplotype with over 1000 SNPs, although only P1 and P2 carry the deletion. Displayed are alleles of every 3rd of the first 100 SNPs on chromosome 15, including 17 of the 51 SNPs deleted. It can be inferred from the family structure that the shared region was transmitted to P1 and P2 through GGM and GF. Note: with only two typed SNPs (one shown) on the left of the deletion, the first two SNPs might only be IBS and not IBD between R1-R3 and P1-P2 as it could not be ruled out that a recombination event close by had taken place at one of the intermediary meioses, particularly since it is known that a recombination often accompanies a deletion event.
Fig 5
Fig 5
Imputing haplotypes into an untyped proband P. One of his children (C1) and 10 of his grandchildren (GC1 to GC10) are chip-typed (in blue). A region on 15q25 with 1001 typed SNPs centred at rs1051730 was investigated. All typed individuals were phased although only three haplotypes, HA , HB and HC, are highlighted. Haplotype HA could be imputed into P because C1 and GC10, descendants of P with different mates share HA IBD, satisfying Conditions 1 and 3. R2 shares HB IBD with GC3 and GC4, satisfying Conditions 1 and 2, and allow us to impute HB into P. However, as an exception to Conditions 2 and 3, HB can actually be imputed into P in an alternative way that does not require R2 and only employs the data from the descendants. Given that GC3 and GC4 share HB, it must be carried by either P or M1. The same with HC since it is shared by C1 and GC6. Given that GC4 and GC6 are related to P and M1 in the same way, HB and HC cannot both originate from M1. Since C1 has both HA and HC, and HA is established to be from P, HC must be from M1. This highlights that there could be extra information in addition to what can be deduced from the pair-wise sharing of relatives. Because R1 is related to P on his father side and R2 is a relative on his mother side, we can deduce that HA is the paternal haplotype of P and HB is the maternal haplotype, information useful for an imprinting model. While GC5, GC7, GC8 and GC9 do not play a role in the imputation of P here, they do contribute to the imputation of P for other regions in the genome. If C1 was not genotyped, GC1 and GC2 could be used to impute C1 and P.

References

    1. Hawley ME, Kidd KK. HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J Hered. 1995;86:409–11. - PubMed
    1. Stephens M, Donnelly P. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003;73:1162–9. - PMC - PubMed
    1. Halperin E, Eskin E. Haplotype reconstruction from genotype data using Imperfect Phylogeny. Bioinformatics. 2004;20:1842–9. - PubMed
    1. Marchini J, et al. A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet. 2006;78:437–50. - PMC - PubMed
    1. Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–44. - PMC - PubMed

Substances