Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 20;5(8):e12185.
doi: 10.1371/journal.pone.0012185.

Novel association strategy with copy number variation for identifying new risk Loci of human diseases

Affiliations

Novel association strategy with copy number variation for identifying new risk Loci of human diseases

Xianfeng Chen et al. PLoS One. .

Abstract

Background: Copy number variations (CNV) are important causal genetic variations for human disease; however, the lack of a statistical model has impeded the systematic testing of CNVs associated with disease in large-scale cohort.

Methodology/principal findings: Here, we developed a novel integrated strategy to test CNV-association in genome-wide case-control studies. We converted the single-nucleotide polymorphism (SNP) signal to copy number states using a well-trained hidden Markov model. We mapped the susceptible CNV-loci through SNP site-specific testing to cope with the physiological complexity of CNVs. We also ensured the credibility of the associated CNVs through further window-based CNV-pattern clustering. Genome-wide data with seven diseases were used to test our strategy and, in total, we identified 36 new susceptible loci that are associated with CNVs for the seven diseases: 5 with bipolar disorder, 4 with coronary artery disease, 1 with Crohn's disease, 7 with hypertension, 9 with rheumatoid arthritis, 7 with type 1 diabetes and 3 with type 2 diabetes. Fifteen of these identified loci were validated through genotype-association and physiological function from previous studies, which provide further confidence for our results. Notably, the genes associated with bipolar disorder converged in the phosphoinositide/calcium signaling, a well-known affected pathway in bipolar disorder, which further supports that CNVs have impact on bipolar disorder.

Conclusions/significance: Our results demonstrated the effectiveness and robustness of our CNV-association analysis and provided an alternative avenue for discovering new associated loci of human diseases.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. CNV-association strategy transforms raw signal into copy number and detects association through site-specific testing and CNV-pattern clustering.
(A) Relative intensity was log2-transformed value for the normalized intensity-sum of the SNP alleles. (B) the relative allele-ratio was actually a normalized anti-tangent value for the intensity ratio of SNP alleles. These two measurements were arranged along the chromosomal sequence as a hidden Markov model. (C) In this model (with well-trained parameters), the copy number could be calculated from the measurements on each SNP site and the neighboring copy numbers. (D) The copy numbers of a designated site for cases and controls were classified before performing the SNP site-based testing, a Chi-squared test with triple NULL hypotheses in which deletion (labeled as Loss), amplification (labeled as Gain) or both (labeled as Abnm) were viewed as abnormal. Copy numbers in a window centered to the significant SNP site (denoted in the orange box) were subjected to a complete linkage clustering (E). To this clustering heat map, a statistical test on the CNV-pattern (named as window-based testing) was used to reconfirm the significance of association. (See details in the Materials and Methods .)
Figure 2
Figure 2. Thresholds for the significance of CNV-association and genome-wide distribution of the results in bipolar disorder.
(A) In the SNP site-based testing, 1000 permutations were performed and the boundary P values (Psnp) were plotted against the false discovery rate (FDR) values, with different colors indicating the different hypotheses (blue for Abnm, green for Loss and red for Gain). FDR<0.05 (labeled with vertical dashed line) for each hypothesis was used to select 2488 SNPs as candidates for the window-based testing. (B) In the window-based testing, 25000 permutations were performed and the resulting P values (Pwin) were plotted against the FDR values. 401 SNP sites were selected as the final results, with an FDR of 2.35×10−3 (indicated by the vertical dashed line) to ensure that the false positives in all the results were less than 1. (C) The −log10 of the SNP site-based P values were plotted against the position on each chromosome. The three hypotheses are plotted in different panels, and the P values of the chromosomes are shown in alternating colors for clarity. The P values that passed the SNP site-based testing are highlighted in green, and the P values that passed the window-based testing are highlighted in yellow. The genome-wide distribution results for the seven diseases are in Figure S1.
Figure 3
Figure 3. Comparison with the traditional genotype-association analysis demonstrates the priority of our method in CNV-regions.
Gen” labels the genotypic testing (a Chi-squared test with 2 degrees of freedom) results obtained from the WTCCC paper . The −log10 of SNP site-based P values in our study with the triple NULL hypotheses, in which deletion (A, labeled Loss), amplification (C, labeled Gain) and both (B, labeled Abnm) were evaluated separately, are plotted against the −log10 of the P value from the genotype-association test of WTCCC . For clarity, the genotype-association P values<10−5 are highlighted in green, the CNV-association P values that passed the single SNP site-based testing are in blue, and the CNV-association P values that passed the window-based testing are in red. The SNP sites that are absent from the genotype-association testing are plotted by default as zero (highlighted in brown), and the absent sites that passed the SNP site-based testing are labeled with black. The genotypic testing (Gen) and trend testing (Add, another testing for genotype tendency of disease in WTCCC [9]) for the seven disease are compared with our CNV-association results in Figure S2. (D) Evidence that CNVs can lead to chaotic genotyping clusters in copy number variable regions. All the 17000 individuals are labeled with grey, individuals with CNVs in the disease group are in red, and individuals with CNVs in controls are in green. More evidence of chaotic sample-wide intensity maps affected by CNVs can be found in Figure S3.

Similar articles

Cited by

References

    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
    1. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. - PMC - PubMed
    1. Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005;307:1434–1440. - PubMed
    1. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature. 2009;459:987–991. - PMC - PubMed
    1. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, et al. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources