Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Oct 21:9:497.
doi: 10.1186/1471-2164-9-497.

Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization

Affiliations
Comparative Study

Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization

Stephane Flibotte et al. BMC Genomics. .

Abstract

Background: Microarray comparative genomic hybridization (CGH) is currently one of the most powerful techniques to measure DNA copy number in large genomes. In humans, microarray CGH is widely used to assess copy number variants in healthy individuals and copy number aberrations associated with various diseases, syndromes and disease susceptibility. In model organisms such as Caenorhabditis elegans (C. elegans) the technique has been applied to detect mutations, primarily deletions, in strains of interest. Although various constraints on oligonucleotide properties have been suggested to minimize non-specific hybridization and improve the data quality, there have been few experimental validations for CGH experiments. For genomic regions where strict design filters would limit the coverage it would also be useful to quantify the expected loss in data quality associated with relaxed design criteria.

Results: We have quantified the effects of filtering various oligonucleotide properties by measuring the resolving power for detecting deletions in the human and C. elegans genomes using NimbleGen microarrays. Approximately twice as many oligonucleotides are typically required to be affected by a deletion in human DNA samples in order to achieve the same statistical confidence as one would observe for a deletion in C. elegans. Surprisingly, the ability to detect deletions strongly depends on the oligonucleotide 15-mer count, which is defined as the sum of the genomic frequency of all the constituent 15-mers within the oligonucleotide. A similarity level above 80% to non-target sequences over the length of the probe produces significant cross-hybridization. We recommend the use of a fairly large melting temperature window of up to 10 degrees C, the elimination of repeat sequences, the elimination of homopolymers longer than 5 nucleotides, and a threshold of -1 kcal/mol on the oligonucleotide self-folding energy. We observed very little difference in data quality when varying the oligonucleotide length between 50 and 70, and even when using an isothermal design strategy.

Conclusion: We have determined experimentally the effects of varying several key oligonucleotide microarray design criteria for detection of deletions in C. elegans and humans with NimbleGen's CGH technology. Our oligonucleotide design recommendations should be applicable for CGH analysis in most species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Resolving power curves for detection of one-copy deletions with 50-mer oligonucleotides. The open circles show the results from resolving power calculations for one-copy deletions, in other words, the logarithm of the expected p-value for a deletion is shown as a function of the number of probes affected by the copy-number aberration. The solid (dashed) lines are linear regressions of the resolving power calculations in C. elegans (human), with their slope being provided in the legends. Red data points and lines correspond to calculations using any oligonucleotide on the arrays without further filtering while the green lines and data points correspond to resolving power calculations after selecting the oligonucleotides with our standard filters as described in the Methods section.
Figure 2
Figure 2
Individual effect of standard oligonucleotide filters on resolving power. The negative (or absolute value) of the slope in the resolving power curve is shown for individual constraints on oligonucleotides when detecting one-copy deletions with 50-mer probes. For each filter, the green bars correspond to resolving power calculations performed exclusively with oligonucleotides accepted by the filter while for the red bars only the oligonucleotides rejected by the filter were used in the calculations. Solid colours are associated with C. elegans and hashed areas are associated with the human data set. The filters from left to right are: elimination of non-unique 20 mers, elimination of homopolymers longer than 5 nucleotides, the selection of a 10°C range in melting temperature, the elimination of oligonucleotides with self-folding energy smaller than -1 kcal/mol, the elimination of oligonucleotides mapping to more than one genomic region with more than 70% similarity, the elimination of oligonucleotides with 15-mer count above the median value, and finally, the simultaneous application of all the those filters. More details on those standard filters can be found in the Methods section.
Figure 3
Figure 3
Effects of varying some oligonucleotide constraints on resolving power. The negative (or absolute value) of the slope in the resolving power curve is shown for individual constraints on oligonucleotides when detecting one-copy deletions with 50-mer probes. The green (red) bars correspond to C. elegans (human) data. The individual oligonucleotide constraints that have been varied consist of (A) the self-folding energy, (B) the length of the longest homopolymer, (C) the 15-mer count, and (D) the melting temperature.
Figure 4
Figure 4
Effects of varying the oligonucleotide length. The negative (or absolute value) of the slope in the resolving power curve is shown as a function of oligonucleotide length for C. elegans (solid colour bars) and human (hashed bars). (A) Effects of varying the oligonucleotide length between 50 and 70 before (red bars) and after (green bars) the use of our standard filters, see Methods for details. For the so-called isothermal design the length of each oligonucleotide was allowed to vary between 50 and 70 in an attempt to minimize the width to the melting temperature distribution. (B) Effects of varying the oligonucleotide length between 50 and 70 for a fixed melting temperature of 72°C without applying additional constraints on oligonucleotides.
Figure 5
Figure 5
Stretch of perfect identity in the middle of 50-mer oligonucleotides in C. elegans. Boxplots of the difference in fluorescence intensity in log2 scale between the original and perturbed 50-mer oligonucleotides. For the green boxplots, the perturbation consisted in randomizing the left and right sides of the original oligonucleotide while keeping a stretch intact in the middle. The red boxplot is associated with a randomization over the full length of the oligonucleotide. In all the cases, the perturbed oligonucleotide has the same GC content as the original oligonucleotide in an attempt to keep the melting temperature constant.
Figure 6
Figure 6
Effect of the position of a stretch of perfect identity within 50-mer oligonucleotides. LOESS regression of the difference in fluorescence intensity (in log2 scale) between the original and perturbed 50-mer oligonucleotides as a function of the length of the stretch of perfect identity. Solid (dashed) lines correspond to C. elegans (human) data. The perfect stretch of identity is either on the left (5') side (green lines), right (3') side (blue lines) or middle (red lines) of the 50-mer oligonucleotide. With NimbleGen's manufacturing process the oligonucleotides are synthesized from 3' to 5' and therefore the left side is protruding and freely floating in the solution while the right side is closer to the slide.
Figure 7
Figure 7
Stretch of perfect identity in the middle of oligonucleotides of various lengths. LOESS regression of the difference in fluorescence intensity (in log2 scale) between the original and perturbed oligonucleotides as a function of the length of the stretch of perfect identity. Solid (dashed) lines correspond to C. elegans (human) data. Results are shown for oligonucleotides of length 50 (green lines), 60 (red lines) and 70 (blue lines).
Figure 8
Figure 8
Introduction of random mismatches in 50-mer oligonucleotides in C. elegans. Boxplots of the difference in fluorescence intensity in log2 scale between the original and perturbed 50-mer oligonucleotides. For the green boxplots, the perturbation consisted in the introduction of mismatches at random locations. The red boxplot is associated with a randomization over the full length of the oligonucleotide. In all the cases, the perturbed oligonucleotide has the same GC content as the original oligonucleotide in an attempt to keep the melting temperature constant.
Figure 9
Figure 9
Introduction of random mismatches in oligonucleotides of various lengths. LOESS regression of the difference in fluorescence intensity (in log2 scale) between the original and perturbed oligonucleotides as a function of the number of mismatches introduced in the oligonucleotide. Solid (dashed) lines correspond to C. elegans (human) data. Results are shown for oligonucleotides of length 50 (green lines), 60 (red lines) and 70 (blue lines).

References

    1. Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME, Carter NP, Scherer SW, Lee C. Copy number variation: new insights in genome diversity. Genome Res. 2006;16:949–961. doi: 10.1101/gr.3677206. - DOI - PubMed
    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. Global variation in copy number in the human genome. Nature. 2006;444:444–454. doi: 10.1038/nature05329. - DOI - PMC - PubMed
    1. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, Leotta A, Pai D, Zhang R, Lee YH, Hicks J, Spence SJ, Lee AT, Puura K, Lehtimäki T, Ledbetter D, Gregersen PK, Bregman J, Sutcliffe JS, Jobanputra V, Chung W, Warburton D, King MC, Skuse D, Geschwind DH, Gilliam TC, Ye K, Wigler M. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. doi: 10.1126/science.1138659. - DOI - PMC - PubMed
    1. Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, Nord AS, Kusenda M, Malhotra D, Bhandari A, Stray SM, Rippey CF, Roccanova P, Makarov V, Lakshmi B, Findling RL, Sikich L, Stromberg T, Merriman B, Gogtay N, Butler P, Eckstrand K, Noory L, Gochman P, Long R, Chen Z, Davis S, Baker C, Eichler EE, Meltzer PS, Nelson SF, Singleton AB, Lee MK, Rapoport JL, King MC, Sebat J. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science. 2008;320:539–543. doi: 10.1126/science.1155174. - DOI - PubMed
    1. Maydan JS, Flibotte S, Edgley ML, Lau J, Selzer RR, Richmond TA, Pofahl NJ, Thomas JH, Moerman DG. Efficient high-resolution deletion discovery in Caenorhabditis elegans by array comparative genomic hybridization. Genome Res. 2007;17:337–347. doi: 10.1101/gr.5690307. - DOI - PMC - PubMed

Publication types