Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007;8(6):R112.
doi: 10.1186/gb-2007-8-6-r112.

Celsius: a community resource for Affymetrix microarray data

Affiliations

Celsius: a community resource for Affymetrix microarray data

Allen Day et al. Genome Biol. 2007.

Abstract

Celsius is a data warehousing system to aggregate Affymetrix CEL files and associated metadata. It provides mechanisms for importing, storing, querying, and exporting large volumes of primary and pre-processed microarray data. Celsius contains ten billion assay measurements and affiliated metadata. It is the largest publicly available source of Affymetrix microarray data, and through sheer volume it allows a sophisticated, broad view of transcription that has not previously been possible.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of data sources present in Celsius. Data have been imported from several sources, 11 of which are shown. Numerals indicate the number of files within each source. Circle overlap is proportional to CEL overlap between data sources. AEX, EBI ArrayExpress [49]; AFFX, Affymetrix [50]; GEO, NCBI Gene Expression Omnibus [51]; GNF, Genomics Institute of the Novartis Research Foundation [52]; LBL, Lawrence Livermore National Laboratory; MIT, Broad Institute [53]; NNMC, NIH Neuroscience Microarray Consortium [54]; PEPR, Public Expression Profiling Resource [55]; UCLA, University of California, Los Angeles DNA Microarray Core Facility [56]; UPENN, University of Pennsylvania Microarray Core Facility [57].
Figure 2
Figure 2
Tally of CELs by organism as of January 2007. SNP, single nucleotide polymorphism.
Figure 3
Figure 3
Monthly tally of CEL file import into Celsius from February 2006 to January 2007. AEX, EBI ArrayExpress; GEO, NCBI Gene Expression Omnibus; NNMC, NIH Neuroscience Microarray Consortium.
Figure 4
Figure 4
Process for importing microarray data from other repositories. Potentially novel CELs are checksummed and associated with a Celsius serial number database identifier (SNID) database accession identifier. Metadata from the source repository (sample accession, dataset accession), as well as metadata from the CEL (checksum, array type), are archived to a relational database. If a CEL not currently present in Celsius is detected, a then a SNID is assigned and the CEL is compressed and archived. Quantification is performed and resulting data are stored in a relational database. GEO, NCBI Gene Expression Omnibus; SN, University of California, Los Angeles DNA Microarray Core Facility.
Figure 5
Figure 5
Human HG-U133A CELs are automatically classified for sex of the tissue or cell line of origin. Orange points are manually curated as male and are also correctly classified as male. Red points are manually curated male that are falsely classified as female. Wheat points are classified as male but do not have manually curated results. These three types of points are also denoted by different shapes in the order of triangle, filled triangle, and circle respectively. All points are classified by assigning two clusters in five-dimensional probeset space, two of which are shown. x-axis, 221728_x_at, XIST; y-axis, 201909_at, RPS4Y1.
Figure 6
Figure 6
Annotation coverage and depth for the Human HG-U133 platforms. (a) Filled wedges indicate the fraction of CELs for which annotation is present. The red and yellow wedges of the left-most pie indicate fraction of diseased and normal samples, respectively. The right-most pie's wedge indicates the fraction of CELs for any annotation from the preceding columns have been given (excluding sex). (b) Human HG-U133A samples grouped by tumor type and normal. Annotation was manually assigned after literature review. Many integumental system tumors are breast tumors. (c) Human HG-U133A samples grouped by tissue of origin. Annotation was manually assigned after literature review.
Figure 7
Figure 7
A gene network constructed from 3600 most varying human probesets. The hierarchical clustering tree and the heat map of the topologic overlap matrix for the 3600 HG-U133A probesets with the largest coefficients of variation measured across 1078 HG-U133A serial number database identifiers (SNIDs) that were annotated as pathologically normal. The color breaks in the colored annotation bar above the heat map mark annotation groups of probesets based on EASE, and tick marks mark the individual modules of highly interconnected probes before being merged into a single annotation group. Colors, left to right are defined as follows: red, transcription; black, response to biotic stimulus; turquoise, ectoderm development; magenta, regulation of metabolism; blue, nervous system development; green, muscle contraction; dark orchid, digestion; chocolate, organic acid metabolism; brown, acute-phase response; dark khaki, complement activation; orange, pregnancy; yellow, sexual reproduction; midnight blue, mitotic cell cycle; deep sky blue, skeletal development; tan, phosphate transport.

References

    1. Barrett T, Suzek T, Troup D, Wilhite S, Ngau W, Ledoux P, Rudnev D, Lash A, Fujibuchi W, Edgar R. NCBI GEO: mining millions of expression profiles-database and tools. Nucleic Acids Res. 2005;33:D562–D566. - PMC - PubMed
    1. Sarkans U, Parkinson H, Lara G, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G, et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics. 2005;21:1495–1501. - PubMed
    1. Rhodes D, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan A. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004;101:9309–9314. - PMC - PubMed
    1. Stuart J, Segal E, Koller D, Kim S. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302:249–255. - PubMed
    1. Chen J, Zhao P, Massaro D, Clerch L, Almon R, DuBois D, Jusko W, Hoffman E. The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface. Nucleic Acids Res. 2004;32:D578–D581. - PMC - PubMed

Publication types

LinkOut - more resources