Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;123(1-4):234-43.
doi: 10.1159/000184713. Epub 2009 Mar 11.

Human copy number polymorphic genes

Affiliations

Human copy number polymorphic genes

J A Bailey et al. Cytogenet Genome Res. 2008.

Abstract

Recent large-scale genomic studies within human populations have identified numerous genomic regions as copy number variant (CNV). As these CNV regions often overlap coding regions of the genome, large lists of potentially copy number polymorphic genes have been produced that are candidates for disease association. Most of the current data regarding normal genic variation, however, has been generated using BAC or SNP microarrays, which lack precision especially with respect to exons. To address this, we assessed 2,790 candidate CNV genes defined from available studies in nine well-characterized HapMap individuals by designing a customized oligonucleotide microarray targeted specifically to exons. Using exon array comparative genomic hybridization (aCGH), we detected 255 (9%) of the candidates as true CNVs including 134 with evidence of variation over the entire gene. Individuals differed in copy number from the control by an average of 100 gene loci. Both partial- and whole-gene CNVs were strongly associated with segmental duplications (55 and 71%, respectively) as well as regions of positive selection. We confirmed 37% of the whole-gene CNVs using the fosmid end sequence pair (ESP) structural variation map for these same individuals. If we modify the end sequence pair mapping strategy to include low-sequence identity ESPs (98-99.5%) and ESPs with an everted orientation, we can capture 82% of the missed genes leading to more complete ascertainment of structural variation within duplicated genes. Our results indicate that segmental duplications are the source of the majority of full-length copy number polymorphic genes, most of the variant genes are organized as tandem duplications, and a significant fraction of these genes will represent paralogs with levels of sequence diversity beyond thresholds of allelic variation. In addition, these data provide a targeted set of CNV genes enriched for regions likely to be associated with human phenotypic differences due to copy number changes and present a source of copy number responsive oligonucleotide probes for future association studies.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Exon-targeted oligonucleotide array CGH design. From our identified list of candidate CNV genes and controls, we targeted an equal number of probes to each exon by including nearly equivalent amounts of sequence for probe design. For each exon, we identified two regions for probe design: 200 bp centered at the beginning and 200 bp centered at the end of the exon. For small exons (<200 bp) this amounted to 100 bp flanking either side plus the length of the exon since these regions overlapped. For medium size exons (200–999 bp) this amounted to 400 bp with equivalent amounts of flanking and exonic sequence. For large exons (≥1 kb), we added an additional 200 bp directly in the center of the exon to provide a measure of continuity in these larger regions. This scheme essentially increased the weight of small exons with the inclusion of flanking sequence and decreased the weight of large exons by only sampling a limited portion. The inclusion of flanking non-transcribed sequence also limited the detection of processed pseudogenes. Overall each of the exons for the candidate genes were represented by 203–600 bases of sequence. These probe design regions were merged into a non-overlapping set of sequence from which NimbleGen algorithms choose appropriate oligonucleotide sequences for array synthesis.
Fig. 2.
Fig. 2.
Examples of detected CNV transcripts. The observed relative signal intensities and results of the chaining algorithm are depicted for (a) the complete deletion of the RhD Blood group antigen gene (RHD) and (b) the partial-gene CNV of the lipoprotein Lp(a) precursor (LPA). Each gene is depicted (blue), the regions used for probe selection, and the relative signal intensities of the probes for each individual assayed. Individual probe signals with absolute relative deviations >1.0 SD are colored green for gain and red for loss. For RHD, an expanded area shows the probes for exon 6 in detail. The results of our detection algorithm are depicted by a red or green line indicating a region of loss or gain. In the case of RHD, these represent detection of gains and losses of the entire transcript. For LPA, the detected regions demonstrate partial transcript loss relative to the control. The region identified in LPA represents a series of variously-sized tandem deletions and duplications based on a 2-exon module containing Kringle domains. Vertical scales represent the natural log of the normalized relative hybridization intensities.
Fig. 3.
Fig. 3.
Whole-gene CNVs compared to fosmid ESP analysis. (a) Validation rates categorized by the percent identity of the most similar duplicon within each whole-gene CNV region. There is a significant decrease in the validated fraction for regions containing duplicons >99% identity. (b) Venn diagram showing the association of the 83 whole-gene CNVs with best-placed fosmid ESPs of low-similarity (98–99.5%) suggesting more divergent unrepresented CNV paralogs and/or with best-placed everted ESPs (>99.5%) suggesting highly similar tandem duplications. Interesting regions containing everted and low-similarity regions overlap suggesting a more complex nature for these CNV genes. The inset depicts the basis for the formation of everted fosmid ESPs, where a clone that traverses the boundary of a tandem duplication can only map to the single copy contained within the reference genome (Cooper et al., 2008).

References

    1. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001;11:1005–1017. - PMC - PubMed
    1. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, et al. Recent segmental duplications in the human genome. Science. 2002;297:1003–1007. - PubMed
    1. Barber JC, Reed CJ, Dahoun SP, Joyce CA. Amplification of a pseudogene cassette underlies euchromatic variation of 16p at the cytogenetic level. Hum Genet. 1999;104:211–218. - PubMed
    1. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. - PubMed
    1. Benjamini Y, Hochberg Y. More powerful procedures for multiple significance testing. Stat Med. 1990;9:811–818. - PubMed