Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Oct;40(10):1245-52.
doi: 10.1038/ng.206. Epub 2008 Sep 7.

A robust statistical method for case-control association testing with copy number variation

Affiliations
Comparative Study

A robust statistical method for case-control association testing with copy number variation

Chris Barnes et al. Nat Genet. 2008 Oct.

Abstract

Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of CNV data showing poor clustering quality and differential errors. (a) Comparison of the distribution of quantitative CNV measurements for a single CNV (W8177) in the two control groups of the WTCCC from Affymetrix 500K SNP genotyping data. (b) Comparison of the distribution of quantitative CNV measurements in array-CGH data (clone Chr15tp-11F12 on the Whole Genome TilePath array1) between the HapMap panel and the Human Genome Diversity Panel (HGDP). (c) Distribution of quantitative CNV measurements from a paralog-ratio-test assay for the β-defensin locus in Dutch and German control cohorts.
Figure 2
Figure 2
Methods for performing CNV-association testing. (a) In association studies, inference of genotypes from data and association testing of genotypic data are generally treated as separate statistical problems; however, the two underlying models can be combined into a single, integrated procedure. (b) Five different case-control association methods are represented schematically on simulated copy number intensity data in case and control groups. The first three methods classify individuals into copy number classes before performing nonparametric testing. Classification is achieved by either a priori binning or assignment on the basis of maximal a posteriori probability from mixture models fitted to the underlying intensity data. The new likelihood ratio test integrates classification and association testing into a single procedure by comparing mixture model fits under nested hypotheses.
Figure 3
Figure 3
Modelling the dependency between copy number and disease. (a) Naïve model in which any dependency between disease phenotype and quantitative measurements of copy number is assumed to be due to differences in the distribution of copy number between cases and controls. (b) A more elaborate model that allows for other differences in measurement distribution between cases and controls due, for example, to differences in DNA qualities.
Figure 4
Figure 4
Sensitivity of 1-d.f. association testing methods to clustering quality and differential errors between cases and controls in simulated data. Six alternative association methods are considered: (i) Mann-Whitney testing for difference in location of CNV measurement distributions, (ii) χ2 trend tests on data binned with a priori thresholds, (iii) χ2 trend tests on mixture model assignment of case and controls together (MM-C), (iv) χ2 trend tests on mixture model assignment of case and controls separately (MM-S), (v) χ2 trend tests on high confidence mixture model assignment of case and controls separately (MM-S95) and (vi) likelihood ratio trend test. Overdispersion (λ) is estimated robustly from a linear fit to the first 90% of quantile-quantile plots from 1,000 simulated datasets. (a) Overdispersion is estimated for alternative association methods at ten different clustering qualities. Density plots for three clustering qualities are shown at the bottom. (b) Overdispersion is estimated for alternative association methods at ten different values of differential shift of means. Density plots for three values of differential shift are shown at the bottom with case and control groups in red and gray. (c) Overdispersion is shown for alternative association methods at ten different values of differential shifts in variance. Density plots for three values of differential shift are shown at the bottom with case and control groups in red and gray.
Figure 5
Figure 5
Statistical power of the likelihood ratio trend test. (a) Clustering quality resulting from alternative probe summary methods for 95 CNVs: linear discriminant function (LDF), principal components analysis (PCA) and arithmetic mean (mean). (b) Statistical power of the LR trend test in simulated data of varying clustering quality is shown for two minor allele frequencies (MAF) with odds ratios (OR) set to equalize maximal theoretical power at 90%. Power is estimated for 2,000 cases and 2,000 controls under two conditions: (i) a model that assumes no differential errors and (ii) a model allowing for differential errors. (c) Statistical power of the LR trend test in empirical data from 95 CNVs of varying clustering quality. Power is estimated for 2,000 cases and 2,000 controls, with odds ratios (OR) set to equalize maximal theoretical power at 90%. For ease of display, where the clustering quality (Q) of a CNV exceeds a value of 6, it has been set to 6.
Figure 6
Figure 6
Examples of empirical CNV associations. (a) Association with a binary disease trait, type 1 diabetes (T1D). The red shaded area represents a density plot of copy number measurement in each group. The two WTCCC control groups come from the 1958 Birth Group (1958BC) and the National Blood Service (NBS). The colored lines reflect the posterior probability distribution for each mixture in the fitted mixture model. The P value derives from the LR trend test comparing case and control groups. (b) The first panel shows normalized expression of gene LOC288077 against copy number measurement, with a linear regression shown in blue. The second panel shows normalized gene expression against mixture model assignment, with a linear regression shown in blue. The P values in these two plots represent the nominal P values on the regression. The third panel shows a histogram of copy number measurement and the colored lines represent the posterior probability distribution for each of the five copy number classes in the fitted mixture model used in the LR trend test.

Similar articles

Cited by

References

    1. Redon R, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
    1. Tuzun E, et al. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed
    1. Lupski JR, Stankiewicz P, editors. Genomic Disorders: The Genomic Basis of Disease. Humana Press; Totowa, New Jersey: 2006. - PubMed
    1. Stranger BE, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. - PMC - PubMed
    1. Flint J, et al. High frequencies of alpha-thalassaemia are the result of natural selection by malaria. Nature. 1986;321:744–750. - PubMed

Publication types