Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Aug;7(3):337-45.
doi: 10.1016/s1525-1578(10)60562-4.

Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays

Affiliations
Comparative Study

Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays

Lynne V Abruzzo et al. J Mol Diagn. 2005 Aug.

Abstract

Oligonucleotide microarrays are a powerful tool for profiling the expression levels of thousands of genes. Different statistical methods for identifying differentially expressed genes can yield different results. To our knowledge, no experimental test has been performed to decide which method best identifies genes that are truly differentially expressed. We applied three statistical methods (dChip, t-test on log-transformed data, and Wilcoxon test) to identify differentially expressed genes in previously untreated patients with chronic lymphocytic leukemia (CLL). We used a training set of Affymetrix Hu133A microarray data from 11 patients with unmutated immunoglobulin (Ig) heavy chain variable region (VH) genes and 8 patients with mutated Ig VH genes. Differential expression was validated using semiquantitative real-time polymerase chain reaction assays and by validating models to predict the somatic mutation status of an independent test set of nine CLL samples. The methods identified 144 genes that were differentially expressed between cases of CLL with unmutated compared with mutated Ig VH genes. Eighty genes were identified by Wilcoxon test, 60 by t-test, and 65 by dChip, but only 11 were identified by all three methods. Greater agreement was found between the t-test and the Wilcoxon test. Differential expression was validated by semiquantitative real-time polymerase chain reaction assays for 83% of individual genes, regardless of the statistical method. However, the Wilcoxon test gave the most accurate predictions on new samples, and dChip, the least accurate. We found that all three methods were equally good for finding differentially expressed genes, but they found different genes. The genes selected by the nonparametric Wilcoxon test are the most robust for predicting the status of new cases. A comprehensive list of all differentially expressed genes can only be obtained by combining the results of multiple statistical tests.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of the number of times a probe set was called present in 19 microarray experiments. About one-fourth of the genes were never present, and about one-fourth were present in all samples.
Figure 2
Figure 2
Analysis of the P values arising from 16,733 t-tests as a β-uniform mixture. Top left: Histogram of the observed P values, with overlaid curves representing the division into uniform and β contributions. Top right: Relationship between cutoff for P values and the false discovery rate. Bottom left: Relation between cutoff for P values and the posterior probability of differential expression. Bottom right: Receiver operating characteristics curve associated with selecting different P value cutoffs.
Figure 3
Figure 3
Analysis of the Wilcoxon rank-sum statistics of 16,733 probe sets using an empirical Bayes method. Top: Histogram of the empirically observed distribution of rank-sum statistics, with an overlaid curve representing the theoretical distribution. Bottom: Posterior probability that an observed rank sum represents a differentially expressed gene.
Figure 4
Figure 4
Venn diagram showing the level of agreement between three different statistical methods for selecting differentially expressed genes from the same data set.
Figure 5
Figure 5
Results of two-way clustering of 28 samples using the genes found to be differentially expressed using three different statistical methods. The samples include 8 mutated samples (blue), 11 unmutated samples (orange), and 9 samples whose status was unknown (gray). Each row contains standardized log expression values for one gene.

Similar articles

Cited by

References

    1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application, Genome Biol. 2001;2:research0032.1–research0032.11. - PMC - PubMed
    1. Damle RN, Wasil T, Fais F, Ghiotto F, Valetto A, Allen SL, Buchbinder A, Budman D, Dittmar K, Kolitz J, Lichtman SM, Schulman P, Vinciguerra VP, Rai KR, Ferrarini M, Chiorazzi N. Ig V gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia. Blood. 1999;94:1840–1847. - PubMed
    1. Hamblin TJ, Davis Z, Gardiner A, Oscier DG, Stevenson FK. Unmutated Ig V(H) genes are associated with a more aggressive form of chronic lymphocytic leukemia. Blood. 1999;15:1848–1854. - PubMed
    1. McCarthy H, Wierda WG, Barron LL, Cromwell CC, Wang J, Coombes KR, Rangel R, Elenitoba-Johnson KSJ, Keating MJ, Abruzzo LV. High expression of activation-induced cytidine deaminase (AID) and splice variants is a distinctive feature of poor prognosis chronic lymphocytic leukemia. Blood. 2003;101:4903–4908. - PubMed
    1. Gold D, Coombes K, Medhane D, Ramaswamy A, Ju Z, Strong L, Koo JS, Kapoor M. A comparative analysis of data generated using two different target preparation methods for hybridization to high-density oligonucleotide microarrays. BMC Genomics. 2004;5:2. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources