Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002:3:4.
doi: 10.1186/1471-2105-3-4. Epub 2002 Jan 31.

Sources of variability and effect of experimental approach on expression profiling data interpretation

Affiliations

Sources of variability and effect of experimental approach on expression profiling data interpretation

Marina Bakay et al. BMC Bioinformatics. 2002.

Abstract

Background: We provide a systematic study of the sources of variability in expression profiling data using 56 RNAs isolated from human muscle biopsies (34 Affymetrix MuscleChip arrays), and 36 murine cell culture and tissue RNAs (42 Affymetrix U74Av2 arrays).

Results: We studied muscle biopsies from 28 human subjects as well as murine myogenic cell cultures, muscle, and spleens. Human MuscleChip arrays (4,601 probe sets) and murine U74Av2 Affymetrix microarrays were used for expression profiling. RNAs were profiled both singly, and as mixed groups. Variables studied included tissue heterogeneity, cRNA probe production, patient diagnosis, and GeneChip hybridizations. We found that the greatest source of variability was often different regions of the same patient muscle biopsy, reflecting variation in cell type content even in a relatively homogeneous tissue such as muscle. Inter-patient variation was also very high (SNP noise). Experimental variation (RNA, cDNA, cRNA, or GeneChip) was minor. Pre-profile mixing of patient cRNA samples effectively normalized both intra- and inter-patient sources of variation, while retaining a high degree of specificity of the individual profiles (86% of statistically significant differences detected by absolute analysis; and 85% by a 4-pairwise comparison survival method).

Conclusions: Using unsupervised cluster analysis and correlation coefficients of 92 RNA samples on 76 oligonucleotide microarrays, we found that experimental error was not a significant source of unwanted variability in expression profiling experiments. Major sources of variability were from use of small tissue biopsies, particularly in humans where there is substantial inter-patient variability (SNP noise).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Unsupervised hierarchical clustering of 42 murine U74Av2 Affymetrix arrays. Unsupervised hierarchical clustering of 42 murine U74Av2 Affymetrix arrays shows that probe synthesis and hybridization is not a major source of experimental variability. Expression profiles shown were from three different experimental groups; one using cultured murine myogenic cells (VSM samples), one using mouse spleens (KNagaraju samples), and one group from mouse skeletal muscle from normal and mdx mouse strains (FBooth samples). For the KNagaraju samples, the same spleen RNA was split prior to cDNA synthesis to create duplicate cDNA-cRNA-profile results; these duplicates show a very high correlation coefficient, and close relationship by Unsupervised clustering (low branches on dendrogram). The VSM cultured samples were each derived from different culture plates, and the FBooth samples from different murine muscles. The duplicate murine muscle samples are more closely related (high correlation coefficient) than the parallel cultures (VSM). Additional variables, such as male versus female (KNagaraju samples), and time after TSA treatment (VSM samples) are indicated, but are not relevant for this manuscript, and will be discussed in more detail elsewhere.
Figure 2
Figure 2
Different regions of the same tissue specimen can give highly similar or highly discordant expression profiles. Shown are scatter plots of the expression profiles of two different regions of the same muscle biopsy from patient 6 (Panel A), and patient 4 (Panel B). Only "present" calls are shown (~2,000 of the 4,600 probe sets studied). An example of one patient showing very high concordance between two different biopsy regions is shown (panel A), and an example of a second patient showing very poor correlation between the two biopsy fragments (panel B). The solid lines indicate two-, three-, ten- and thirty-fold difference thresholds.
Figure 3
Figure 3
Unsupervised hierarchical clustering of 24 human MuscleChip Affymetrix arrays. A dendrogram of nearest neighbor analysis of 24 MuscleChip expression profiles shows that intra-patient tissue heterogeneity can be a greater source of experimental variability than inter-patient or age-dependent variation. The height of the branch-point of each tree reflects the extent of relatedness of the different profiles. The two profiles for each patient or mixed controls are from different regions of the same muscle biopsies.
Figure 4
Figure 4
Unsupervised hierarchical clustering of mixed and individual profiles. Shown is a dendrogram of nearest neighbor analysis of 34 MuscleChip expression profiles including both individual and mixed samples. Mixed samples cluster as very highly related samples, even though different regions of the component biopsies were used to generate the duplicates. Importantly, the mixed DMD profiles cluster more closely with mixed normal controls than with individual DMD patient profiles. This data suggests that intra-patient (tissue heterogeneity) and inter-patient (SNP noise) can be significant sources of experimental variability.
Figure 5
Figure 5
Comparison of individual profiles to mixed profiles by t-test statistics. Shown in green are differentially expressed genes for 2 mixed DMD 5–6 y profiles versus 6 mixed controls. Shown in red are differentially expressed genes for 10 individual DMD 5–6 y profiles and 6 mixed controls. P-value thresholds used to generate gene lists are indicated. The p-value for mixed profiles is held at p < 0.05, as the low sample number (2 versus 2) precludes obtaining more significant values. This analysis suggests that the use of t-test statistics for small number of mixed samples is relatively sensitive, but not highly specific.
Figure 6
Figure 6
Use of >2-fold survival method provides relatively specific, but insensitive detection of significant gene changes. Shown in green is a representation of number of genes surviving four pair-wise comparisons to two mixed control profiles, with retention of only those genes showing fold changes > 2-fold in the four pair-wise comparisons. Shown in red are differentially expressed genes for 10 individual DMD 5–6 y profiles versus 6 mixed controls at the indicated p-value thresholds. This fold-change survival method shows good specificity at p < 0.05 for individual profiles (85%), however it is relatively insensitive.
Figure 7
Figure 7
Direct comparison of t-test and four pairwise survival methods. Shown in green is list of 417 differentially expressed genes surviving four pair-wise comparisons to mixed control. Shown in red are differentially expressed genes for 2 mixed DMD 5–6 y profiles versus 6 mixed controls showing p < 0.05 by t-test. (Panel A) This analysis shows the survival method to be considerably more stringent than t-test. Most gene expression changes detected by the mixed sample survival method are included in changes from t-test analysis (both mixed and individual profiles). Panel B is the compilation of previous figures, showing that > 2-fold survival method using only four mixed profiles (two DMD, two control) is highly specific but likely insensitive compared to t-test methods

References

    1. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nature Genetics. 1999;21:20–24. doi: 10.1038/4447. - DOI - PubMed
    1. Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nature Genetics. 1999;21:10–14. doi: 10.1038/4434. - DOI - PubMed
    1. Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nature Genetics. 1999;21:33–37. doi: 10.1038/4462. - DOI - PubMed
    1. Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs G. Making and reading microarrays. Nature Genetics. 1999;21:15–19. doi: 10.1038/4439. - DOI - PubMed
    1. McGall G, Labadie J, Brock P, Wallraff G, Nguyen T, Hinsberg W. Light-directed synthesis of high-density oligonucleotide arrays using semicomductor photoresists. Proc. Natl. Acad. Sci. USA. 1996;93:13555–13560. doi: 10.1073/pnas.93.24.13555. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources