Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 20:11:315.
doi: 10.1186/1471-2164-11-315.

A simple optimization can improve the performance of single feature polymorphism detection by Affymetrix expression arrays

Affiliations

A simple optimization can improve the performance of single feature polymorphism detection by Affymetrix expression arrays

Youko Horiuchi et al. BMC Genomics. .

Abstract

Background: High-density oligonucleotide arrays are effective tools for genotyping numerous loci simultaneously. In small genome species (genome size: < approximately 300 Mb), whole-genome DNA hybridization to expression arrays has been used for various applications. In large genome species, transcript hybridization to expression arrays has been used for genotyping. Although rice is a fully sequenced model plant of medium genome size (approximately 400 Mb), there are a few examples of the use of rice oligonucleotide array as a genotyping tool.

Results: We compared the single feature polymorphism (SFP) detection performance of whole-genome and transcript hybridizations using the Affymetrix GeneChip Rice Genome Array, using the rice cultivars with full genome sequence, japonica cultivar Nipponbare and indica cultivar 93-11. Both genomes were surveyed for all probe target sequences. Only completely matched 25-mer single copy probes of the Nipponbare genome were extracted, and SFPs between them and 93-11 sequences were predicted. We investigated optimum conditions for SFP detection in both whole genome and transcript hybridization using differences between perfect match and mismatch probe intensities of non-polymorphic targets, assuming that these differences are representative of those between mismatch and perfect targets. Several statistical methods of SFP detection by whole-genome hybridization were compared under the optimized conditions. Causes of false positives and negatives in SFP detection in both types of hybridization were investigated.

Conclusions: The optimizations allowed a more than 20% increase in true SFP detection in whole-genome hybridization and a large improvement of SFP detection performance in transcript hybridization. Significance analysis of the microarray for log-transformed raw intensities of PM probes gave the best performance in whole genome hybridization, and 22,936 true SFPs were detected with 23.58% false positives by whole genome hybridization. For transcript hybridization, stable SFP detection was achieved for highly expressed genes, and about 3,500 SFPs were detected at a high sensitivity (> 50%) in both shoot and young panicle transcripts. High SFP detection performances of both genome and transcript hybridizations indicated that microarrays of a complex genome (e.g., of Oryza sativa) can be effectively utilized for whole genome genotyping to conduct mutant mapping and analysis of quantitative traits such as gene expression levels.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Effects of applied DNA amounts on signal intensity of PM and MM. Probe numbers in 0.1 log10-intensity windows were plotted with a step width of 0.05 log10-intensity. A comparison is shown between the probe number distributions obtained using different quantities of gDNA products.
Figure 2
Figure 2
Improvement of SFP detection performance at the optimized condition. SFP detection performances were compared by ROC curves. Sensitivity (the ratio of the number of correctly called SFPs to the expected number of SFPs) was plotted against the false-positive rate (the ratio of the number of falsely called SFPs to the total called number of SFPs) by changing the thresholds of analysis.
Figure 3
Figure 3
Comparison of SFP detection performances using different statistical tests and background correlations. (a) Performances of the three statistical tests for called SFPs: classical Student's t-test (SAM, black and ANOVA, red) and a newly developed method for SFP detection (SNEP, blue). (b) The effects of six different signal corrections on SAM analysis: MAS5 (red), RMA (green), GCRMA (their affinity [magenta] and full model [yellow]), and global scaling (dark blue), and quantile normalization (light blue). Definition of "Sensitivity" and "FPR" is the same as in Figure 2.
Figure 4
Figure 4
Distribution of the number of probes in the Nipponbare genome. Distribution of the number of unique probes (a), predicted SFPs (b), correctly and falsely called SFPs (c), and sensitivity of SFP detection (d) in a 1-Mb window with a step width of 0.1 Mb. SFPs were called by SAM at a threshold of delta = 0.378. TRUE and FALSE are the same as in Table 2. Red lines indicate averages through the genome.
Figure 5
Figure 5
Effect of signal intensity on the intensity difference between PM and MM probes for completely match transcripts. Four replicates of shoot or young panicle data were analyzed by ANOVA, and the frequency of probe pairs with significantly different intensities (p < 0.05) between PM and MM were plotted against averaged PM signal intensities within 0.1. The dashed line indicates the cut-off signal value (2.5) for mRNA analysis.
Figure 6
Figure 6
Comparison of SFP detection performance of transcript and whole-genome hybridizations. SFP detection performances of shoot (blue) and young panicle (red) transcript hybridizations by SNEP, and of whole-genome hybridization (black) by SAM are represented by ROC curves. Definition of "Sensitivity" and "FPR" is the same as in Figure 2.
Figure 7
Figure 7
Effects of probe binding affinity on SFP detection by whole-genome hybridization. Frequencies of unique probes (black), correctly called SFP probes (red), and false-negative SFP probes (blue) in 0.5 kcal/mol window are plotted against their binding affinities.
Figure 8
Figure 8
Distribution of false-positive SFPs in a probe set by whole-genome hybridization. Falsely called SFPs were added up at every number in a set. Red and black lines show expected and observed value, respectively. Expected values were estimated by a binomial distribution of the false-positive rate.

Similar articles

Cited by

References

    1. Winzeler E, Richards D, Conway A, Goldstein A, Kalman S, McCullough M, McCusker J, Stevens D, Wodicka L, Lockhart D, Davis R. Direct allelic variation scanning of the yeast genome. Science. 1998;281:1194–7. doi: 10.1126/science.281.5380.1194. - DOI - PubMed
    1. Borevitz J, Liang D, Plouffe D, Chang H, Zhu T, Weigel D, Berry C, Winzeler E, Chory J. Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003;13:513–23. doi: 10.1101/gr.541303. - DOI - PMC - PubMed
    1. Gong J, Waner D, Horie T, Li S, Horie R, Abid K, Schroeder J. Microarray-based rapid cloning of an ion accumulation deletion mutant in Arabidopsis thaliana. Proc Natl Acad Sci USA. 2004;101:15404–9. doi: 10.1073/pnas.0404780101. - DOI - PMC - PubMed
    1. Hazen S, Borevitz J, Harmon F, Pruneda-Paz J, Schultz T, Yanovsky M, Liljegren S, Ecker J, Kay S. Rapid array mapping of circadian clock and developmental mutations in Arabidopsis. Plant Physiol. 2005;138:990–7. doi: 10.1104/pp.105.061408. - DOI - PMC - PubMed
    1. Hazen S, Schultz T, Pruneda-Paz J, Borevitz J, Ecker J, Kay S. LUX ARRHYTHMO encodes a Myb domain protein essential for circadian rhythms. Proc Natl Acad Sci USA. 2005;102:10387–92. doi: 10.1073/pnas.0503029102. - DOI - PMC - PubMed

Publication types

MeSH terms