Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;15 Suppl 1(Suppl 1):S6.
doi: 10.1186/1471-2105-15-S1-S6. Epub 2014 Jan 10.

CorrelaGenes: a new tool for the interpretation of the human transcriptome

CorrelaGenes: a new tool for the interpretation of the human transcriptome

Paolo Cremaschi et al. BMC Bioinformatics. 2014.

Abstract

Background: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists.

Results: By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2) p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results.

Conclusions: The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the CorrelaGenes workflow.
Figure 2
Figure 2
Schematic representation of GDS2516 processing. (A) Design of GDS2516 and experimental factor definition (F1 to F7). (B) Contrast matrix created with all groups versus group comparisons. In light blue are shown the 21 pair-wise comparisons. (C) Comparisons manually selected by the experts. In dark blue are shown the 5 comparisons selected.
Figure 3
Figure 3
Genes modulation in CorrelaGenes. Histogram of the number of genes with respect to the number of comparisons in which they were found modulated.
Figure 4
Figure 4
Homepage of the CorrelaGenes web interface.
Figure 5
Figure 5
Impact of the ARM indexes on the number of genes in the output lists. (A) Box-plot of the number of genes with respect to different thresholds of χ2 p value. (B) Box-plot of the number of genes with respect to different thresholds of Lift.
Figure 6
Figure 6
Analysis of the PRPF19 gene lists. Trend of the DAVID Enrichment Scores (ES) with respect to different thresholds of χ2 p value with (A) % co-pres = 40 and (B) % co-pres = 40 and Lift = 2 (the GO terms list with related Benjamini p value is available in Additional Files 7).
Figure 7
Figure 7
Analysis of up- and down-regulation in PRPF19 gene lists. Trend of the DAVID Enrichment Scores (ES) with respect to different thresholds of χ2 p value with % co-pres = 40 and Lift = 2 distinguishing between up- or down-regulated target and related genes (the GO terms list with related Benjamini p value is available in Additional File 7).

References

    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–890. - PMC - PubMed
    1. Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A. ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011;39:D1002–1004. - PMC - PubMed
    1. Piwowar H. Who shares? Who doesn't? Factors associated with openly archiving raw research data. PLoS One. 2011;6:e18657. - PMC - PubMed
    1. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME)--toward standards for microarray data. Nat Genet. 2001;29:365–371. - PubMed
    1. Rung J, Brazma A. Reuse of public genome-wide gene expression data. Nat Rev Genet. 2013;14:89–99. - PubMed

Publication types

LinkOut - more resources