Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Aug 22:7:388.
doi: 10.1186/1471-2105-7-388.

Selecting normalization genes for small diagnostic microarrays

Affiliations
Comparative Study

Selecting normalization genes for small diagnostic microarrays

Jochen Jaeger et al. BMC Bioinformatics. .

Abstract

Background: Normalization of gene expression microarrays carrying thousands of genes is based on assumptions that do not hold for diagnostic microarrays carrying only few genes. Thus, applying standard microarray normalization strategies to diagnostic microarrays causes new normalization problems.

Results: In this paper we point out the differences of normalizing large microarrays and small diagnostic microarrays. We suggest to include additional normalization genes on the small diagnostic microarrays and propose two strategies for selecting them from genomewide microarray studies. The first is a data driven univariate selection of normalization genes. The second is multivariate and based on finding a balanced diagnostic signature. Finally, we compare both methods to standard normalization protocols known from large microarrays.

Conclusion: Not including additional genes for normalization on small microarrays leads to a loss of diagnostic information. Using house keeping genes from the literature for normalization fails to work for certain datasets. While a data driven selection of additional normalization genes works well, the best results were obtained using a balanced signature.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Normalization effect on diagnostic microarrays. The global signal normalization effect resulting from standard normalization protocols applied to diagnostic microarrays: Shown are changes of expression difference, when switching from a large microarray to a diagnostic microarray. The top genes are those genes with the maximal expression difference between TEL-AML versus BCR-ABL and E2A-PBX1. Note, that expression differences on log scale reflect fold changes.
Figure 2
Figure 2
Characteristics of simulated data. The left plot shows the genewise population differences contrasted with the mean differences in simulated data. Population differences μiA MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF8oqBdaqhaaWcbaGaemyAaKgabaGaemyqaeeaaaaa@30FC@ - μiB MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF8oqBdaqhaaWcbaGaemyAaKgabaGaemOqaieaaaaa@30FE@ were set for each gene by randomly drawing from N(0,1). Simulated differences stem from drawing data from a multivariate distribution with these given population means. The right plot shows boxplots of all 3000 genes for all 50 samples of the simulated data for the training set (the test set is very similar and not shown).
Figure 3
Figure 3
Recovery of original effect using different normalization methods. Effects of different normalization methods for diagnostic microarrays evaluated on simulated data. " + " depicts expression differences in the test data of the signature genes after normalization with all 3000 genes. This, we would like to recover with normalization methods for diagnostic microarrays, too."o" corresponds to using the standard protocol on the diagnostic microarray. Here, all the signal is lost, "r" corresponds to a normalization of the diagnostic microarray with 10 random genes. It already recovers the signal partially. The right plot is a closeup of the left plot, showing additionally the performance of the proposed normalization schemes. " + " and "r" are the same as in the left plot. Additionally, normalization using lowest variance "v", smallest difference "d", smallest coefficient of variation "c" and balanced signatures "b" are shown. For better visibility the symbols "b" and "d" are slightly moved to the side so that they do not overlap.
Figure 4
Figure 4
Loss of effect for different normalization methods. Sum of squared errors to the real underlying expression differences of the proposed normalization methods and the standard protocol averaged over 30 runs of the simulated data. "Small CV" depicts the normalization method using smallest coefficient of variation and "small effect" depicts the normalization method using small differences of average expression.
Figure 5
Figure 5
Classification accuracy using different normalization methods. Cross validation results of predictive performance of the same diagnostic signature used with different normalization strategies for diagnostic microarrays. The left plot shows classification accuracies for distinguishing TEL-AML1 from other groups in leukemia (ps = pn = 5). The right plot shows classification accuracies for distinguishing normal from adenocarcinomas in lung (ps = pn = 3). The boxplots are sorted by increasing median accuracy. When they have the same median the mean was used for sorting.

Similar articles

Cited by

References

    1. van 't Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M, Peterse H, van der Kooy K, Marton M, Witteveen A, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend S. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. - DOI - PubMed
    1. Yeoh E, Ross M, Shurtleff S, Williams W, Patel D, Mahfouz R, Behm F, Raimondi S, Relling M, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui C, Evans W, Naeve C, Wong L, Downing J. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1:133–143. doi: 10.1016/S1535-6108(02)00032-6. - DOI - PubMed
    1. Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM, Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA. 2004;101:811–6. doi: 10.1073/pnas.0304146101. - DOI - PMC - PubMed
    1. Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sørlie T, Dai H, He YD, Veer LJV, Bartelink H, van de Rijn M, Brown PO, van de Vijver MJ. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA. 2005;102:3738–43. doi: 10.1073/pnas.0409462102. - DOI - PMC - PubMed
    1. Li W, Yang Y. Methods of Microarray Data Analysis. Kluwer Academic; 2002. How many genes are needed for a discriminant microarray data analysis; pp. 137–150.

Publication types

MeSH terms