Sequence biases in large scale gene expression profiling data
- PMID: 16840527
- PMCID: PMC1524917
- DOI: 10.1093/nar/gkl404
Sequence biases in large scale gene expression profiling data
Abstract
We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, 'Classic' Massively Parallel Signature Sequencing (MPSS) and 'Signature' MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).
Figures
References
-
- Velculescu V.E., Vogelstein B., Kinzler K.W. Analysing uncharted transcriptomes with SAGE. Trends Genet. 2000;16:423–425. - PubMed
-
- Saha S., Sparks A.B., Rago C., Akmaev V., Wang C.J., Vogelstein B., Kinzler K.W., Velculescu V.E. Using the transcriptome to annotate the genome. Nat. Biotechnol. 2002;20:508–512. - PubMed
-
- Brenner S., Johnson M., Bridgham J., Golda G., Lloyd D.H., Johnson D., Luo S., McCurdy S., Foy M., Ewan M., et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 2000;18:630–634. - PubMed
-
- Meyers B.C., Vu T.H., Tej S.S., Ghazal H., Matvienko M., Agrawal V., Ning J., Haudenschild C.D. Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat. Biotechnol. 2004;22:1006–1011. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
