Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data
- PMID: 22873695
- PMCID: PMC3462729
- DOI: 10.1186/1471-2105-13-193
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data
Abstract
Background: Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed.
Results: We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size.
Conclusions: Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
Figures

Similar articles
-
MicrobesOnline: an integrated portal for comparative and functional genomics.Nucleic Acids Res. 2010 Jan;38(Database issue):D396-400. doi: 10.1093/nar/gkp919. Epub 2009 Nov 11. Nucleic Acids Res. 2010. PMID: 19906701 Free PMC article.
-
Assessment of gene set analysis methods based on microarray data.Gene. 2014 Jan 25;534(2):383-9. doi: 10.1016/j.gene.2013.08.063. Epub 2013 Sep 3. Gene. 2014. PMID: 24012817
-
SEGS: search for enriched gene sets in microarray data.J Biomed Inform. 2008 Aug;41(4):588-601. doi: 10.1016/j.jbi.2007.12.001. Epub 2007 Dec 15. J Biomed Inform. 2008. PMID: 18234563
-
Microarrays for microbiologists.Microbiology (Reading). 2001 Jun;147(Pt 6):1403-1414. doi: 10.1099/00221287-147-6-1403. Microbiology (Reading). 2001. PMID: 11390672 Review. No abstract available.
-
[Transcriptomes for serial analysis of gene expression].J Soc Biol. 2002;196(4):303-7. J Soc Biol. 2002. PMID: 12645300 Review. French.
Cited by
-
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation.Front Microbiol. 2016 Nov 24;7:1819. doi: 10.3389/fmicb.2016.01819. eCollection 2016. Front Microbiol. 2016. PMID: 27933038 Free PMC article.
-
Cautions about the reliability of pairwise gene correlations based on expression data.Front Microbiol. 2015 Jun 26;6:650. doi: 10.3389/fmicb.2015.00650. eCollection 2015. Front Microbiol. 2015. PMID: 26167162 Free PMC article.
-
IMPROVED PERFORMANCE OF GENE SET ANALYSIS ON GENOME-WIDE TRANSCRIPTOMICS DATA WHEN USING GENE ACTIVITY STATE ESTIMATES.Pac Symp Biocomput. 2017;22:449-460. doi: 10.1142/9789813207813_0042. Pac Symp Biocomput. 2017. PMID: 27896997 Free PMC article.
-
Novel gene sets improve set-level classification of prokaryotic gene expression data.BMC Bioinformatics. 2015 Oct 28;16:348. doi: 10.1186/s12859-015-0786-7. BMC Bioinformatics. 2015. PMID: 26511329 Free PMC article.
-
A Bayesian Framework for the Classification of Microbial Gene Activity States.Front Microbiol. 2016 Aug 9;7:1191. doi: 10.3389/fmicb.2016.01191. eCollection 2016. Front Microbiol. 2016. PMID: 27555837 Free PMC article.
References
-
- Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5(1):e8. doi: 10.1371/journal.pbio.0050008. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases