Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 8:13:193.
doi: 10.1186/1471-2105-13-193.

Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

Affiliations

Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

Nathan L Tintle et al. BMC Bioinformatics. .

Abstract

Background: Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed.

Results: We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size.

Conclusions: Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Gene set consistency by gene set size across the eight gene set sources.a. Assessing gene set consistency using smean,diff1. 1 Smaller values of smean,exp indicate more consistent sources. b. Assessing gene set consistency using smean,exp1. 1 Smaller values of smean,diff indicate more consistent sources. c. Assessing gene set consistency using corrmean1. 1Larger values of corrmean indicate more consistent sources.1d. Assessing gene set consistency using PC11. 1 Larger values of PC1 indicate more consistent sources

Similar articles

Cited by

References

    1. D’haeseleer P. How does gene expression clustering work? Nat Biotechnol. 2005;23(12):1499–1501. doi: 10.1038/nbt1205-1499. - DOI - PubMed
    1. Ringner M. What is principal components analysis? Nat Biotechnol. 2008;26(3):303–304. doi: 10.1038/nbt0308-303. - DOI - PubMed
    1. Becker SA, Palsson BO. Context-specific metabolic networks are consistent with experiments. PLoS Comput Biol. 2008;4(5):e1000082. doi: 10.1371/journal.pcbi.1000082. - DOI - PMC - PubMed
    1. Jensen PA, Papin JA. Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics. 2011;27:541–547. doi: 10.1093/bioinformatics/btq702. - DOI - PMC - PubMed
    1. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5(1):e8. doi: 10.1371/journal.pbio.0050008. - DOI - PMC - PubMed

Publication types

LinkOut - more resources