Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 20:11:50.
doi: 10.1186/1471-2164-11-50.

Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

Affiliations

Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

Jun Yin et al. BMC Genomics. .

Abstract

Background: The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data.

Results: Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation.

Conclusions: By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Work flow showing an outline of the probe remapping procedure.
Figure 2
Figure 2
Comparison of gene lists generated by remapped probe sets using the UCSC and Ensembl databases. A, C: number of differentially expressed genes. Number of up-regulated genes is in red. Number of down-regulated genes is in green. B, D: number of alternative spliced genes. A, B: 36 versus 52 hpf whole zebrafish embryos data set. C, D: 3 versus 5 dpf zebrafish eyes data set
Figure 3
Figure 3
Real-time PCR validation of differentially expressed genes in the 3 versus 5 dpf zebrafish eyes data set. These genes can only be detected by remapping using the multiple source database, UCSC. (A) Signal intensities from microarray data. (B) Real-time PCR results depicted as relative abundance compared to lowest abundance sample. *: p-value < 0.05. **: p-value < 0.01.
Figure 4
Figure 4
Schematic view of cry1b showing the advantage of integrating multiple databases. Dr.7371.1.S2_at is the original Affymetrix probe set targeting cry1b as shown in the blue square. dre00066_1 is the remapped probe set as shown in red square. Affymetrix probes match the RefSeq transcript NM_131790 and the GenBank transcript BC044558but no Ensembl transcript is matched by this probe set. dre00066_1_L0 and dre00066_1_R0 are primers used in the real-time PCR.
Figure 5
Figure 5
Schematic view of tpm3 showing the advantage of integrating multiple databases in revealing alternative splicing pattern. (A) Schematic view of tpm3. dre03301_1 and dre03301_2 (in the red square) are remapped probe sets with probes from Dr.18559.1.S1_at and Dr.20297.1.S1_at (in the blue square) respectively. (B) Log-base 2 signals of the probes from the remapped probe sets with 3dpf and 5dpf gene expression in black and red dots, respectively. (C) The real-time PCR results depicted as relative quantification compared to lowest abundance sample.

Similar articles

Cited by

References

    1. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33:e175. doi: 10.1093/nar/gni179. - DOI - PMC - PubMed
    1. Moll AG, Lindenmeyer MT, Kretzler M, Nelson PJ, Zimmer R, Cohen CD. Transcript-specific expression profiles derived from sequence-based analysis of standard microarrays. PLoS ONE. 2009;4:e4702. doi: 10.1371/journal.pone.0004702. - DOI - PMC - PubMed
    1. Lu J, Lee JC, Salit ML, Cam MC. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics. 2007;8:108. doi: 10.1186/1471-2105-8-108. - DOI - PMC - PubMed
    1. Liu H, Zeeberg BR, Qu G, Koru AG, Ferrucci A, Kahn A, Ryan MC, Nuhanovic A, Munson PJ, Reinhold WC, Kane DW, Weinstein JN. AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics. 2007;23:2385–2390. doi: 10.1093/bioinformatics/btm360. - DOI - PubMed
    1. Lee JC, Stiles D, Lu J, Cam MC. A detailed transcript-level probe annotation reveals alternative splicing based microarray platform differences. BMC Genomics. 2007;8:284. doi: 10.1186/1471-2164-8-284. - DOI - PMC - PubMed

Publication types