Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Mar;78(3):437-50.
doi: 10.1086/500808. Epub 2006 Jan 26.

A comparison of phasing algorithms for trios and unrelated individuals

Affiliations
Comparative Study

A comparison of phasing algorithms for trios and unrelated individuals

Jonathan Marchini et al. Am J Hum Genet. 2006 Mar.

Abstract

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.

PubMed Disclaimer

Figures

Figure  1
Figure 1
The method of constructing new data sets with artificially induced ambiguous sites from real trio data. The example in the figure consists of a father-mother-child trio at four SNPs. The genotypes at all sites are such that the haplotypes of each individual can be inferred exactly. A new “alternative universe” child can be created by swapping the transmission status of the haplotypes in one of the parents. In this example, both children inherit the “1010” haplotype from the father but inherit different haplotypes from the mother; the real child inherits the “1000” haplotype, and the new child inherits the “0101” haplotype. When the trio consisting of the father, the mother, and the new child is considered, we see that the transmission status of the fourth SNP is now not known unambiguously if we consider just the genotypes at the site. The performance of phasing algorithms can be assessed for these data sets by their ability to reconstruct the correct phase at these sites.
Figure  2
Figure 2
True (X-axis) and estimated (Y-axis) r2 for the PHASE (left column), pairwise (center column), and GC methods (right column) and with (rows 1 and 3) and without (rows 2 and 4) missing data. Rows 1 and 2 show the differences for a trio data set, whereas rows 3 and 4 show the differences for the unrelated individuals data set. The data set was chosen at random from the 50 data sets analyzed.

Comment in

References

Web Resources

    1. Authors' Web site, http://www.stats.ox.ac.uk/~marchini/phaseoff.html
    1. Clayton Web site, http://www-gene.cimr.cam.ac.uk/clayton/software/ (for the SNPHAP algorithm)
    1. HAP, http://research.calit2.net/hap/
    1. International HapMap Project, http://www.hapmap.org/
    1. J.M.'s Web site, http://www.stats.ox.ac.uk/~marchini/HapMap.Phasing.pdf (for details of how haplotypes were inferred for the PHASE v.1 HapMap)

References

    1. Abecasis GR, Wigginton JE (2005) Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am J Hum Genet 77:754–767 - PMC - PubMed
    1. Akey J, Jin L, Xiong M (2001) Haplotypes vs single marker linkage disequilibrium tests: what do we gain? Eur J Hum Genet 9:291–30010.1038/sj.ejhg.5200619 - DOI - PubMed
    1. Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc Natl Acad Sci USA 98:4563–456810.1073/pnas.081068098 - DOI - PMC - PubMed
    1. Carlson C, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120 - PMC - PubMed
    1. Chapman JM, Cooper JD, Todd JA, Clayton DG (2003) Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered 56:18–3110.1159/000073729 - DOI - PubMed

Publication types

LinkOut - more resources