Comparative Study

. 2008 Oct;40(10):1245-52.

doi: 10.1038/ng.206. Epub 2008 Sep 7.

A robust statistical method for case-control association testing with copy number variation

Chris Barnes¹, Vincent Plagnol, Tomas Fitzgerald, Richard Redon, Jonathan Marchini, David Clayton, Matthew E Hurles

Affiliations

PMID: 18776912
PMCID: PMC2784596
DOI: 10.1038/ng.206

Comparative Study

A robust statistical method for case-control association testing with copy number variation

Chris Barnes et al. Nat Genet. 2008 Oct.

. 2008 Oct;40(10):1245-52.

doi: 10.1038/ng.206. Epub 2008 Sep 7.

Authors

Chris Barnes¹, Vincent Plagnol, Tomas Fitzgerald, Richard Redon, Jonathan Marchini, David Clayton, Matthew E Hurles

Affiliation

¹ Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

PMID: 18776912
PMCID: PMC2784596
DOI: 10.1038/ng.206

Abstract

Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.

PubMed Disclaimer

Figures

**Figure 1**
Example of CNV data showing poor clustering quality and differential errors. (a) Comparison of the distribution of quantitative CNV measurements for a single CNV (W8177) in the two control groups of the WTCCC from Affymetrix 500K SNP genotyping data. (b) Comparison of the distribution of quantitative CNV measurements in array-CGH data (clone Chr15tp-11F12 on the Whole Genome TilePath array1) between the HapMap panel and the Human Genome Diversity Panel (HGDP). (c) Distribution of quantitative CNV measurements from a paralog-ratio-test assay for the β-defensin locus in Dutch and German control cohorts.

**Figure 2**
Methods for performing CNV-association testing. (a) In association studies, inference of genotypes from data and association testing of genotypic data are generally treated as separate statistical problems; however, the two underlying models can be combined into a single, integrated procedure. (b) Five different case-control association methods are represented schematically on simulated copy number intensity data in case and control groups. The first three methods classify individuals into copy number classes before performing nonparametric testing. Classification is achieved by either a priori binning or assignment on the basis of maximal a posteriori probability from mixture models fitted to the underlying intensity data. The new likelihood ratio test integrates classification and association testing into a single procedure by comparing mixture model fits under nested hypotheses.

**Figure 3**
Modelling the dependency between copy number and disease. (a) Naïve model in which any dependency between disease phenotype and quantitative measurements of copy number is assumed to be due to differences in the distribution of copy number between cases and controls. (b) A more elaborate model that allows for other differences in measurement distribution between cases and controls due, for example, to differences in DNA qualities.

**Figure 4**
Sensitivity of 1-d.f. association testing methods to clustering quality and differential errors between cases and controls in simulated data. Six alternative association methods are considered: (i) Mann-Whitney testing for difference in location of CNV measurement distributions, (ii) χ² trend tests on data binned with a priori thresholds, (iii) χ² trend tests on mixture model assignment of case and controls together (MM-C), (iv) χ² trend tests on mixture model assignment of case and controls separately (MM-S), (v) χ² trend tests on high confidence mixture model assignment of case and controls separately (MM-S95) and (vi) likelihood ratio trend test. Overdispersion (λ) is estimated robustly from a linear fit to the first 90% of quantile-quantile plots from 1,000 simulated datasets. (a) Overdispersion is estimated for alternative association methods at ten different clustering qualities. Density plots for three clustering qualities are shown at the bottom. (b) Overdispersion is estimated for alternative association methods at ten different values of differential shift of means. Density plots for three values of differential shift are shown at the bottom with case and control groups in red and gray. (c) Overdispersion is shown for alternative association methods at ten different values of differential shifts in variance. Density plots for three values of differential shift are shown at the bottom with case and control groups in red and gray.

**Figure 5**
Statistical power of the likelihood ratio trend test. (a) Clustering quality resulting from alternative probe summary methods for 95 CNVs: linear discriminant function (LDF), principal components analysis (PCA) and arithmetic mean (mean). (b) Statistical power of the LR trend test in simulated data of varying clustering quality is shown for two minor allele frequencies (MAF) with odds ratios (OR) set to equalize maximal theoretical power at 90%. Power is estimated for 2,000 cases and 2,000 controls under two conditions: (i) a model that assumes no differential errors and (ii) a model allowing for differential errors. (c) Statistical power of the LR trend test in empirical data from 95 CNVs of varying clustering quality. Power is estimated for 2,000 cases and 2,000 controls, with odds ratios (OR) set to equalize maximal theoretical power at 90%. For ease of display, where the clustering quality (Q) of a CNV exceeds a value of 6, it has been set to 6.

**Figure 6**
Examples of empirical CNV associations. (a) Association with a binary disease trait, type 1 diabetes (T1D). The red shaded area represents a density plot of copy number measurement in each group. The two WTCCC control groups come from the 1958 Birth Group (1958BC) and the National Blood Service (NBS). The colored lines reflect the posterior probability distribution for each mixture in the fitted mixture model. The P value derives from the LR trend test comparing case and control groups. (b) The first panel shows normalized expression of gene *LOC288077* against copy number measurement, with a linear regression shown in blue. The second panel shows normalized gene expression against mixture model assignment, with a linear regression shown in blue. The P values in these two plots represent the nominal P values on the regression. The third panel shows a histogram of copy number measurement and the colored lines represent the posterior probability distribution for each of the five copy number classes in the fitted mixture model used in the LR trend test.

See this image and copyright information in PMC

Cited by

The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation.
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. Valsesia A, et al. Front Genet. 2013 May 30;4:92. doi: 10.3389/fgene.2013.00092. eCollection 2013. Front Genet. 2013. PMID: 23750167 Free PMC article.
Genetic Association Analysis of Copy Number Variations for Meat Quality in Beef Cattle.
Wu J, Wu T, Xie X, Niu Q, Zhao Z, Zhu B, Chen Y, Zhang L, Gao X, Niu X, Gao H, Li J, Xu L. Wu J, et al. Foods. 2023 Oct 31;12(21):3986. doi: 10.3390/foods12213986. Foods. 2023. PMID: 37959106 Free PMC article.
Utilizing extended pedigree information for discovery and confirmation of copy number variable regions among Mexican Americans.
Blackburn A, Göring HH, Dean A, Carless MA, Dyer T, Kumar S, Fowler S, Curran JE, Almasy L, Mahaney M, Comuzzie A, Duggirala R, Blangero J, Lehman DM. Blackburn A, et al. Eur J Hum Genet. 2013 Apr;21(4):404-9. doi: 10.1038/ejhg.2012.188. Epub 2012 Aug 22. Eur J Hum Genet. 2013. PMID: 22909773 Free PMC article.
CNVineta: a data mining tool for large case-control copy number variation datasets.
Wittig M, Helbig I, Schreiber S, Franke A. Wittig M, et al. Bioinformatics. 2010 Sep 1;26(17):2208-9. doi: 10.1093/bioinformatics/btq356. Epub 2010 Jul 6. Bioinformatics. 2010. PMID: 20605930 Free PMC article.
Novel association strategy with copy number variation for identifying new risk Loci of human diseases.
Chen X, Li X, Wang P, Liu Y, Zhang Z, Zhao G, Xu H, Zhu J, Qin X, Chen S, Hu L, Kong X. Chen X, et al. PLoS One. 2010 Aug 20;5(8):e12185. doi: 10.1371/journal.pone.0012185. PLoS One. 2010. PMID: 20808825 Free PMC article.

See all "Cited by" articles

References

1. Redon R, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
1. Tuzun E, et al. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed
1. Lupski JR, Stankiewicz P, editors. Genomic Disorders: The Genomic Basis of Disease. Humana Press; Totowa, New Jersey: 2006. - PubMed
1. Stranger BE, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. - PMC - PubMed
1. Flint J, et al. High frequencies of alpha-thalassaemia are the result of natural selection by malaria. Nature. 1986;321:744–750. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

061860/WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A robust statistical method for case-control association testing with copy number variation

Affiliation

A robust statistical method for case-control association testing with copy number variation

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical