Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Jun;18(3):366-74.
doi: 10.1016/j.sbi.2008.02.005. Epub 2008 May 27.

The current excitement about copy-number variation: how it relates to gene duplications and protein families

Affiliations
Review

The current excitement about copy-number variation: how it relates to gene duplications and protein families

Jan O Korbel et al. Curr Opin Struct Biol. 2008 Jun.

Abstract

Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Enrichment and depletion of gene functional categories (Gene Ontology (GO) annotation [67], GO biological process, level 3) among genes affected by CNVs. Significant enrichment (red shading) and depletion (blue shading) of protein-coding genes were determined using software published in [68] (Bonferroni-corrected P-value cutoff of 0.01). Genomic coordinates of CNVs (build hg18) were obtained from DGV [16] on 30th November 2007. A high-confidence list of recently successfully duplicated genes was obtained by collecting RefSeq genes spanned by segmental duplications (SDs) retrieved from http://eichlerlab.gs.washington.edu/database.html. Gene coordinates and GO annotations were obtained from Ensembl (www.ensembl.org/biomart/martview). Functional categories observed in less than 2% of genes were grouped into “other”. (A) Depletion and enrichment of GO categories among CNVs. (B) Depletion and enrichment of GO categories among successful gene duplicates. (C) Depletion and enrichment of GO categories among unsuccessful duplicates (i.e., nonprocessed pseudogenes).
Figure 2
Figure 2
Gene duplicates and the human protein interaction network. (A) Recently successfully duplicated genes are significantly enriched at the periphery of the protein network, as evidenced from a significantly decreased average betweenness centrality with P≪0.01 (the interaction network was constructed and the P-values generated as described in [57]). (B) Unsuccessful duplicates are significantly enriched at the network center, P<0.01.
Figure 3
Figure 3. Disease associations of protein domains in genes affected by copy-number variation
(A) CNVs are significantly associated with disease genes. Associations between protein domains and diseases were retrieved from OMIM (www.ncbi.nlm.nih.gov/omim). (B) Enrichment of protein domains of cancer-related genes among CNVs. Genes implicated in cancer were obtained from CGC (www.sanger.ac.uk/genetics/CGP/Census).
Figure 4
Figure 4
Efficient high-throughput functional genomics technologies used for identifying CNVs in a genome-wide fashion. Figure adapted from [75].

References

    1. Pennisi E. Breakthrough of the year. Human genetic variation. Science. 2007;318:1842–1843. - PubMed
    1. Consortium TIH, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
    1. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318:420–426. - PMC - PubMed

Publication types

LinkOut - more resources