Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar;8(1):57-71.
doi: 10.1016/S1672-0229(10)60006-X.

Quality assessment of transcriptome data using intrinsic statistical properties

Affiliations

Quality assessment of transcriptome data using intrinsic statistical properties

Guillaume Brysbaert et al. Genomics Proteomics Bioinformatics. 2010 Mar.

Abstract

In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the parameters of which can be estimated from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant information to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB1700 arrays are provided. They should allow AB1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A. Heatmaps of signal-normalized logarithmic coefficient of variance (V) against logarithmic signal (S) for one actual dataset example from each of the three species for which AB1700 arrays are available. The color gradient from blue to black to red indicates the density of the data points. The distribution is characteristic of the AB1700 data. B. Similar heatmaps showing the distribution of ln(V) vs. ln(S) for two monkey species hybridized to the human AB1700 arrays. C. A heatmap of the same distribution for random data generated from our AB1700 signal and signal-variance model. Note that the distribution closely resembles those from actual data.
Figure 2
Figure 2
Superposition of the two S.I. value histograms for the 750× and 500× reference files. Note that the first (empty) bin is of different size than the others.
Figure 3
Figure 3
Superposition of the S.I. value histograms for the two data series GSE3155 (black) and GSE6806 (red) as calculated using the 500× reference dataset. Note that the first (empty) bin is of different size than the others. The auto-referenced set of 500 arrays making up the 500× is shown as backdrop (white bars).
Figure 4
Figure 4
A. Ln(V) versus ln(S) heatmaps for two experiments from the GSE6806 series. The S.I. values based on the 500× reference set are indicated. B. A Venn diagram depicting the number of probes considered statistically significantly (P<0.05) upregulated when comparing the Dicer knockdown (Dicer-KO) versus the wild-type (Dicer-WT) case either with or without the GSM157090 experiment.
Figure 5
Figure 5
A. S.I. histogram of the individual experiments of series GSE10503 using the 500× reference file. The inlet shows the same series analyzed using the 300× reference file. The two outlier experiments are displayed with a red border. B. A principal component analysis in correspondence space of the GSE10503 series. The two outlier experiments as identified in (A) are indicated using the same color code as in (A). Every biological condition is displayed using its own coloring. C. A Venn diagram depicting the number of probes considered statistically significantly (P<0.01) regulated when comparing the Hdac3-null versus the Hdac3-control experiments either with or without the two outliers identified in (A).

Similar articles

Cited by

References

    1. Canales R.D. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 2006;24:1115–1122. - PubMed
    1. Stafford P., Brun M. Three methods for optimization of cross-laboratory and cross-platform microarray expression data. Nucleic Acids Res. 2007;35:e72. - PMC - PubMed
    1. Gollub J. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 2003;31:94–96. - PMC - PubMed
    1. MAQC Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 2006;24:1151–1161. - PMC - PubMed
    1. Patterson T.A. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 2006;24:1140–1150. - PubMed

Publication types