Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov;36(19):e126.
doi: 10.1093/nar/gkn556. Epub 2008 Sep 10.

Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms

Affiliations

Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms

Sharon J Diskin et al. Nucleic Acids Res. 2008 Nov.

Abstract

Whole-genome microarrays with large-insert clones designed to determine DNA copy number often show variation in hybridization intensity that is related to the genomic position of the clones. We found these 'genomic waves' to be present in Illumina and Affymetrix SNP genotyping arrays, confirming that they are not platform-specific. The causes of genomic waves are not well-understood, and they may prevent accurate inference of copy number variations (CNVs). By measuring DNA concentration for 1444 samples and by genotyping the same sample multiple times with varying DNA quantity, we demonstrated that DNA quantity correlates with the magnitude of waves. We further showed that wavy signal patterns correlate best with GC content, among multiple genomic features considered. To measure the magnitude of waves, we proposed a GC-wave factor (GCWF) measure, which is a reliable predictor of DNA quantity (correlation coefficient = 0.994 based on samples with serial dilution). Finally, we developed a computational approach by fitting regression models with GC content included as a predictor variable, and we show that this approach improves the accuracy of CNV detection. With the wide application of whole-genome SNP genotyping techniques, our wave adjustment method will be important for taking full advantage of genotyped samples for CNV analysis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Genomic wave is not platform-specific. An illustration of four representative examples showing the LRR values for chromosome 11 in the Affymetrix 250K NSP (from Affymetrix Mapping 500K array sets) and genome-wide 6.0 arrays, as well as the Illumina HumanHap550 and HumanHap1M arrays. Four different DNA samples were genotyped by these four arrays. In all cases, we observe similar shapes of wavy patterns with identical or opposite peaks and troughs. This indicates that genomic wave is an intrinsic property of the human genome, regardless of the technical platforms.
Figure 2.
Figure 2.
Genomic wave is correlated with GC content. Analysis of the correlation between GC percentage and median values of LRR in nonoverlapping windows across the genome, using 10, 100 kb and 1 Mb window sizes, respectively. (A) A sample genotyped by the Illumina HumanHap550 array is chosen and the signal pattern on chromosome 11 is shown. (B–D) We observe increasingly higher correlations between these two measures with larger sliding window sizes. The correlation coefficients for panels (B)–(D) are 0.70, 0.85 and 0.96, respectively. (E) A sample genotyped by the Illumina HumanHap550 array that has opposite peaks and troughs as (A) is chosen and the signal pattern on chromosome 11 is shown. (F–G) We observe increasingly higher correlations between these two measures with larger sliding window sizes.
Figure 3.
Figure 3.
The signal intensity is highly correlated with GC content in sliding windows in chromosome 11. The median LRR values for 1 Mb sliding windows in two samples (one with negative wave and one with positive wave) are plotted against several genomic features, including GC content, segmental duplication, gene content, exon content, simple repeats and conserved region.
Figure 4.
Figure 4.
Genomic wave is correlated with DNA quantity, but not quality. (A) Plot of GCWF measure against 260/280 ratio for DNA of 1444 neuroblastoma patients (Pearson correlation coefficient = 0.0918). (B) Gel electrophoresis assay for 750 ng DNA from 18 genotyped samples (six without waves, six with positive waves and six with negative waves) shows no evidence of DNA degradation, which would appear as smears rather than clear bands (see Supplementary Figure 2 for examples). The largest size marker for the 100-bp ladder lane is 2072 bp, while the largest size marker for the 1-kb ladder lane is 12 kb. (C) Plot of GCWF against total quantity of DNA for 1444 samples (Pearson correlation coefficient = 0.4371). Samples with an initial estimated concentration > 100 ng/µl were diluted to 75 ng/µl, explaining the clustering at 1125 ng. (D) Plot of DNA quantity versus GCWF for serial dilutions of DNA from a single sample.
Figure 5.
Figure 5.
A comparison of signal intensities along chromosome 11 for five duplicated samples with different DNA quantities genotyped by the Illumina HumanHap550 arrays. The LRR signal intensity shows wavy patterns when the DNA quantity deviates from the recommended quantity (750 ng). The directionality of the waves is reversed when increasing amounts of DNA is used for genotyping.
Figure 6.
Figure 6.
DNA quantity and waves impact CNV calling. CNV calling results from the cnvPartition algorithm, and the PennCNV algorithm without and with signal adjustment, as implemented in the BeadStudio software, on five duplicated DNA samples with different quantities. A 5-SNP threshold was used in PennCNV so that the number of calls was comparable to cnvPartition. The color and the thickness of the bars represent the copy number and the size of the CNV calls, respectively. DNA quantity and signal intensity waves severely affect the accuracy of CNV calling; however, after signal adjustment, higher specificity and higher concordance rate in CNV calls are achieved.

References

    1. Carter NP. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 2007;39:S16–S21. - PMC - PubMed
    1. Scherer SW, Lee C, Birney E, Altshuler D, Eichler EE, Carter N, Hurles M, Feuk L. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 2007;39:S7–S15. - PMC - PubMed
    1. Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat. Rev. Genet. 2006;7:85–97. - PubMed
    1. Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET, et al. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 2007;8:R228. - PMC - PubMed
    1. Frilyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN. Hidden Markov models approach to the analysis of array CGH data. J. Multivar. Anal. 2004;90:132–153.

Publication types