Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan 4;102(1):158-62.
doi: 10.1073/pnas.0404730102. Epub 2004 Dec 22.

GERBIL: Genotype resolution and block identification using likelihood

Affiliations

GERBIL: Genotype resolution and block identification using likelihood

Gad Kimmel et al. Proc Natl Acad Sci U S A. .

Abstract

The abundance of genotype data generated by individual and international efforts carries the promise of revolutionizing disease studies and the association of phenotypes with individual polymorphisms. A key challenge is providing an accurate resolution (phasing) of the genotypes into haplotypes. We present here results on a method for genotype phasing in the presence of recombination. Our analysis is based on a stochastic model for recombination-poor regions ("blocks"), in which haplotypes are generated from a small number of core haplotypes, allowing for mutations, rare recombinations, and errors. We formulate genotype resolution and block partitioning as a maximum-likelihood problem and solve it by an expectation-maximization algorithm. The algorithm was implemented in a software package called GERBIL (genotype resolution and block identification using likelihood), which is efficient and simple to use. We tested GERBIL on four large-scale sets of genotypes. It outperformed two state-of-the-art phasing algorithms. The phase algorithm was slightly more accurate than GERBIL when allowed to run with default parameters, but required two orders of magnitude more time. When using comparable running times, GERBIL was consistently more accurate. For data sets with hundreds of genotypes, the time required by phase becomes prohibitive. We conclude that GERBIL has a clear advantage for studies that include many hundreds of genotypes and, in particular, for large-scale disease studies.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An illustration of the probabilistic model. This model has three common haplotypes covering four SNPs. In the first step, pairs of the common haplotypes are chosen according to their probabilities αi. In this example 1,2 and 1,3 are chosen. In the second step, the alleles at each site of the haplotypes are determined according to the probabilities θi,j. In the third step, each genotype is formed by a confluence of two haplotypes created at the former step.
Fig. 2.
Fig. 2.
Phasing accuracy on the Yoruba genotypes. The x axis shows the number of SNPs in each of the 52 data sets. The y axis shows the switch error rate of gerbil (x) and phase (○) on each data set.
Fig. 3.
Fig. 3.
Running times on the Yoruba genotypes. The x axis shows the number of SNPs in each data set. The y axis shows a logarithm (base 10) of running times (in seconds) of gerbil (x) and phase (○) on each data set.

References

    1. Patil, N., Berno, A. J., Hinds, D. A., Barrett, W. A., Doshi, J. M., Hacker, C. R., Kautzer, C. R., Lee, D. H., Marjoribanks, C., McDonough, D. P., et al. (2001) Science 294, 1719-1723. - PubMed
    1. Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et al. (2002) Science 296, 2225-2229. - PubMed
    1. Kruglyak, L. & Nickerson, D. A. (2001) Nat. Genet. 27, 234-236. - PubMed
    1. Martin, E. R., Lai, E. H., Gilbert, J. R., Rogala, A. R., Afshari, A. J., Riley, J., Finch, K. L., Stevens, J. F., Livak, K. J., Slotterbeck, B. D., et al. (2000) Am. J. Hum. Genet. 67, 383-394. - PMC - PubMed
    1. Morris, R. W. & Kaplan, N. L. (2002) Genet. Epidemiol. 23, 221-233. - PubMed

Publication types

LinkOut - more resources