A computational framework discovers new copy number variants with functional importance

doi:10.1371/journal.pone.0017539

. 2011 Mar 29;6(3):e17539.

doi: 10.1371/journal.pone.0017539.

A computational framework discovers new copy number variants with functional importance

Samprit Banerjee¹, Derek Oldridge, Maria Poptsova, Wasay M Hussain, Dimple Chakravarty, Francesca Demichelis

Affiliations

PMID: 21479260
PMCID: PMC3066184
DOI: 10.1371/journal.pone.0017539

A computational framework discovers new copy number variants with functional importance

Samprit Banerjee et al. PLoS One. 2011.

. 2011 Mar 29;6(3):e17539.

doi: 10.1371/journal.pone.0017539.

Authors

Samprit Banerjee¹, Derek Oldridge, Maria Poptsova, Wasay M Hussain, Dimple Chakravarty, Francesca Demichelis

Affiliation

¹ Department of Public Health, Weill Cornell Medical College, New York, New York, United States of America.

PMID: 21479260
PMCID: PMC3066184
DOI: 10.1371/journal.pone.0017539

Abstract

Structural variants which cause changes in copy numbers constitute an important component of genomic variability. They account for 0.7% of genomic differences in two individual genomes, of which copy number variants (CNVs) are the largest component. A recent population-based CNV study revealed the need of better characterization of CNVs, especially the small ones (<500 bp).We propose a three step computational framework (Identification of germline Changes in Copy Number or IgC2N) to discover and genotype germline CNVs. First, we detect candidate CNV loci by combining information across multiple samples without imposing restrictions to the number of coverage markers or to the variant size. Secondly, we fine tune the detection of rare variants and infer the putative copy number classes for each locus. Last, for each variant we combine the relative distance between consecutive copy number classes with genetic information in a novel attempt to estimate the reference model bias. This computational approach is applied to genome-wide data from 1250 HapMap individuals. Novel variants were discovered and characterized in terms of size, minor allele frequency, type of polymorphism (gains, losses or both), and mechanism of formation. Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb. Finally, we queried transcriptomic data from 129 individuals determined by RNA-sequencing as further validation and to assess the functional role of the new variants. We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08). Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic of the approach used for the Identification of germline Changes in Copy Numbers (IgC2N).**
IgC2N is a multistep approach, which includes the identification of potential CNV loci along the genome (A–F), a bias correction step (H) and the CN genotyping (I), leveraging the experimental data from many samples. (A–I) The log2 intensity ratio signals or the segmented signal (A) is dichotomized on a marker and sample basis (B). A genome-wide score vector S is obtained by summing the transformed signal across all samples on a marker basis (C). The null distribution of the score is obtained by permutations in order to identify the level of significant deviation of the score S, S_sig, from the baseline signal. (D) S_sig value corresponding to a pre-specified FDR threshold is applied to the data vector S (E). The intermediate output is a collection of putative polymorphic loci across the genome. No restriction on size or coverage is applied (F). A Gaussian Mixture Model (GMM) is applied to predict the CN classes (genotypes not assigned) (G). The distance between the median of consecutive CN classes (1 CN class difference) is compared to the 1 CN class difference of all CNVs, and relative classes are inferred (H). Along with 1 CN class differences, the presence of “0” class and expected direction of bias are also considered to infer the genotypes of these CN classes and the reference model bias is estimated (I).

**Figure 2. Results of Power Simulation Study as function of Size and Coverage.**
*In silico* power computations for IgC2N. The panel of 8 plots is organized in rows by sample size of the datasets used for simulations and in columns by the number of markers covering a CNV and size (in kb) of a CNV. Each plot shows the average power to detect CNVs with three different frequencies, i.e. 1%, 5% and 15% for the dotted, dashed and solid lines respectively.

**Figure 3. Validation study summary.**
(A) Barplots of the rate (percentage) of validation categorized with respect to number of marker coverage, size of the variant and its minor allele frequency. (B) The frequency distribution of CN genotypes of validated, not validated and all CNV. (C) An example of a new variant validated by NimbleGen data: The line plots of smoothed intensity signal using 42M NimbleGen platform for each of 40 HapMap samples showing polymorphism for the locus IgH3.965 on Chromosome 6. A scatter plot (inset) of the discovery signal (x-axis) and the validation signal (y-axis) color coded with respect to the IgC2N CN call on the discovery samples.

**Figure 4. Functional impact of CNVs on human transcriptome.**
(A) Proportion of functional variants with respect to variant size and type of polymorphisms. Percentages are evaluated with respect to each subclass. (B) Significance of associations with respect to gene-variant distance. The *cis* analysis included 2 Mb windows. Minus log 10 of the q-values are plotted against the distance between the mid points of genes and variants. Up and down arrows depict the direction of the association. Red symbols identify data points corresponding to the new CNVs. (C) List of top ranked associations involving new variant residing within protein coding regions. (D) Examples of new variants showing significant effect on gene transcript. mRNA levels are plotted against the copy number states of new variants identified by IgC2N (box plots) and against the copy number intensity ratios (scatter plots). P-values from the regression analysis against copy number states are reported.

See this image and copyright information in PMC

Cited by

A Mild PUM1 Mutation Is Associated with Adult-Onset Ataxia, whereas Haploinsufficiency Causes Developmental Delay and Seizures.
Gennarino VA, Palmer EE, McDonell LM, Wang L, Adamski CJ, Koire A, See L, Chen CA, Schaaf CP, Rosenfeld JA, Panzer JA, Moog U, Hao S, Bye A, Kirk EP, Stankiewicz P, Breman AM, McBride A, Kandula T, Dubbs HA, Macintosh R, Cardamone M, Zhu Y, Ying K, Dias KR, Cho MT, Henderson LB, Baskin B, Morris P, Tao J, Cowley MJ, Dinger ME, Roscioli T, Caluseriu O, Suchowersky O, Sachdev RK, Lichtarge O, Tang J, Boycott KM, Holder JL Jr, Zoghbi HY. Gennarino VA, et al. Cell. 2018 Feb 22;172(5):924-936.e11. doi: 10.1016/j.cell.2018.02.006. Cell. 2018. PMID: 29474920 Free PMC article.
Variants at IRX4 as prostate cancer expression quantitative trait loci.
Xu X, Hussain WM, Vijai J, Offit K, Rubin MA, Demichelis F, Klein RJ. Xu X, et al. Eur J Hum Genet. 2014 Apr;22(4):558-63. doi: 10.1038/ejhg.2013.195. Epub 2013 Sep 11. Eur J Hum Genet. 2014. PMID: 24022300 Free PMC article.
In-silico identification and functional validation of allele-dependent AR enhancers.
Garritano S, Romanel A, Ciribilli Y, Bisio A, Gavoci A, Inga A, Demichelis F. Garritano S, et al. Oncotarget. 2015 Mar 10;6(7):4816-28. doi: 10.18632/oncotarget.3019. Oncotarget. 2015. PMID: 25693204 Free PMC article.
NUDT21-spanning CNVs lead to neuropsychiatric disease and altered MeCP2 abundance via alternative polyadenylation.
Gennarino VA, Alcott CE, Chen CA, Chaudhury A, Gillentine MA, Rosenfeld JA, Parikh S, Wheless JW, Roeder ER, Horovitz DD, Roney EK, Smith JL, Cheung SW, Li W, Neilson JR, Schaaf CP, Zoghbi HY. Gennarino VA, et al. Elife. 2015 Aug 27;4:e10782. doi: 10.7554/eLife.10782. Elife. 2015. PMID: 26312503 Free PMC article.
Plasticity of the myelination genomic fabric.
Iacobas S, Thomas NM, Iacobas DA. Iacobas S, et al. Mol Genet Genomics. 2012 Mar;287(3):237-46. doi: 10.1007/s00438-012-0673-0. Epub 2012 Jan 13. Mol Genet Genomics. 2012. PMID: 22246408

See all "Cited by" articles

References

1. Lee C, Morton CC. Structural genomic variation and personalized medicine. N Engl J Med. 2008;358:740–741. - PubMed
1. Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, et al. Challenges and standards in integrating surveys of structural variation. Nat Genet. 2007;39:S7–15. - PMC - PubMed
1. Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res. 2009;19:1682–1690. - PMC - PubMed
1. Gokcumen O, Lee C. Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009;49:18–25. - PMC - PubMed
1. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, et al. Origins and functional impact of copy number variation in the human genome. Nature. 464:704–712. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] Lee C, Morton CC. Structural genomic variation and personalized medicine. N Engl J Med. 2008;358:740–741. - PubMed

[2] Lee C, Morton CC. Structural genomic variation and personalized medicine. N Engl J Med. 2008;358:740–741. - PubMed

[3] Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, et al. Challenges and standards in integrating surveys of structural variation. Nat Genet. 2007;39:S7–15. - PMC - PubMed

[4] Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, et al. Challenges and standards in integrating surveys of structural variation. Nat Genet. 2007;39:S7–15. - PMC - PubMed

[5] Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res. 2009;19:1682–1690. - PMC - PubMed

[6] Shaikh TH, Gai X, Perin JC, Glessner JT, Xie H, et al. High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome Res. 2009;19:1682–1690. - PMC - PubMed

[7] Gokcumen O, Lee C. Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009;49:18–25. - PMC - PubMed

[8] Gokcumen O, Lee C. Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009;49:18–25. - PMC - PubMed

[9] Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, et al. Origins and functional impact of copy number variation in the human genome. Nature. 464:704–712. - PMC - PubMed

[10] Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, et al. Origins and functional impact of copy number variation in the human genome. Nature. 464:704–712. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A computational framework discovers new copy number variants with functional importance

Affiliation

A computational framework discovers new copy number variants with functional importance

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases