Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jun 10;100(12):7225-30.
doi: 10.1073/pnas.1237858100. Epub 2003 May 30.

Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA

Affiliations

Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA

Yaning Yang et al. Proc Natl Acad Sci U S A. .

Abstract

The efficiency of single-nucleotide polymorphism haplotype analysis may be increased by DNA pooling, which can dramatically reduce the number of genotyping assays. We develop a method for obtaining maximum likelihood estimates of haplotype frequencies for different pool sizes, assess the accuracy of these estimates, and show that pooling DNA samples is efficient in estimating haplotype frequencies. Although pooling K individuals increases ambiguities, at least for small pool size K and small numbers of loci, the uncertainty of estimation increases <K times that of unpooled DNA. We also develop the asymptotic variance-covariance of maximum likelihood estimates and evaluate the accuracy of variance estimates by Monte Carlo methods. When the sample size of pools is moderately large, the asymptotic variance estimates are rather accurate. Completely or partially missing genotyping information is allowed for in our analysis. Finally, our methods are applied to single-nucleotide polymorphisms in the angiotensinogen gene.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
MSE (Left) and relative efficiencies (Right) of estimating two-locus haplotype frequencies from pools of K individuals, each relative to nonpooled DNA (K = 1) for different LD coefficients (total sample size is n = 180 individuals). Minor allele frequencies for the two loci are 0.4 and 0.5.
Fig. 2.
Fig. 2.
MSE (Left) and relative efficiencies (Right) of estimating two-locus haplotype frequencies form pools of K individuals, each relative to nonpooled DNA (K = 1) for different LD coefficients (total sample size is n = 180 individuals). Minor allele frequencies for the two loci are 0.2 and 0.3.
Fig. 3.
Fig. 3.
Relative efficiencies for different numbers, n, of individuals in estimating the three-locus haplotype frequencies from pools of size K individuals each. The true haplotype frequencies are 0, 0.0815, 0, 0, 0.5245, 0.2829, 0.0051, and 0.1060.
Fig. 4.
Fig. 4.
Relative efficiencies of estimating three-locus haplotype frequencies from pseudopooling of K individuals, each relative to nonpooled DNA (K = 1); sample size is n = 60 individuals.
Fig. 5.
Fig. 5.
MSE (Left) and relative efficiencies (Right) for different rates of completely missing data in estimating the two-locus haplotype frequencies from pools of size K individuals each. Minor allele frequencies are 0.4 and 0.5; LD coefficient D′ = 0.5.

Similar articles

Cited by

References

    1. Clark, A. (1990) Mol. Biol. Evol. 7, 111-122. - PubMed
    1. Lange, K. (1999) Numerical Analysis for Statisticians (Springer, Berlin).
    1. Excoffier, L. & Slatkin, M. (1995) Mol. Biol. Evol. 12, 921-927. - PubMed
    1. Hawley, M. & Kidd, K. (1995) J. Hered. 86, 409-411. - PubMed
    1. Long, J., Williams, R. & Urbanek, M. (1995) Am. J. Hum. Genet. 56, 799-810. - PMC - PubMed

Publication types

LinkOut - more resources