Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Apr 3:4:12.
doi: 10.1186/1471-2105-4-12. Epub 2003 Apr 3.

Significance analysis of lexical bias in microarray data

Affiliations
Comparative Study

Significance analysis of lexical bias in microarray data

Charles C Kim et al. BMC Bioinformatics. .

Abstract

Background: Genes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further experimentation. A number of sophisticated methods for identification of significant lexical trends are currently available, but these methods are generally too cumbersome for practical use by most microarray users.

Results: We have developed a tool, LACK, for calculating the statistical significance of apparent lexical bias in microarray datasets. The frequency of a user-specified list of search terms in a list of genes which are differentially regulated is assessed for statistical significance by comparison to randomly generated datasets. The simplicity of the input files and user interface targets the average microarray user who wishes to have a statistical measure of apparent lexical trends in analyzed datasets without the need for bioinformatics skills. The software is available as Perl source or a Windows executable.

Conclusion: We have used LACK in our laboratory to generate biological hypotheses based on our microarray data. We demonstrate the program's utility using an example in which we confirm significant upregulation of SPI-2 pathogenicity island of Salmonella enterica serovar Typhimurium by the cation chelator dipyridyl.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Binomial distribution of SPI-2 genes in a dataset The total filtered dataset consisted on 4290 unique elements. An SGL of 256 genes was generated using SAM and analyzed for 34 members of SPI-2. The arrow indicates the number of matches in the SGL, with P(x > 8) = 0.004. The binomial analysis required 5 seconds; Poisson analysis of the same datasets required 7 seconds. A 21,450 element dataset created by replicating the 4290 element dataset 5 times required 8 seconds for binomial analysis. The files used for this analysis are available at the LACK website or as supplementary data.

Similar articles

Cited by

References

    1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. - DOI - PubMed
    1. Kaminski N, Friedman N. Practical approaches to analyzing results of microarray experiments. Am J Respir Cell Mol Biol. 2002;27:125–132. - PubMed
    1. Altman RB, Raychaudhuri S. Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol. 2001;11:340–347. doi: 10.1016/S0959-440X(00)00212-8. - DOI - PubMed
    1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. - DOI - PMC - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. - DOI - PMC - PubMed

Publication types

MeSH terms