Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov 24;12 Suppl 12(Suppl 12):S2.
doi: 10.1186/1471-2105-12-S12-S2.

Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana

Affiliations

Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana

Jianhua Ruan et al. BMC Bioinformatics. .

Abstract

Background: Several large-scale gene co-expression networks have been constructed successfully for predicting gene functional modules and cis-regulatory elements in Arabidopsis (Arabidopsis thaliana). However, these networks are usually constructed and analyzed in an ad hoc manner. In this study, we propose a completely parameter-free and systematic method for constructing gene co-expression networks and predicting functional modules as well as cis-regulatory elements.

Results: Our novel method consists of an automated network construction algorithm, a parameter-free procedure to predict functional modules, and a strategy for finding known cis-regulatory elements that is suitable for consensus scanning without prior knowledge of the allowed extent of degeneracy of the motif. We apply the method to study a large collection of gene expression microarray data in Arabidopsis. We estimate that our co-expression network has ~94% of accuracy, and has topological properties similar to other biological networks, such as being scale-free and having a high clustering coefficient. Remarkably, among the ~300 predicted modules whose sizes are at least 20, 88% have at least one significantly enriched functions, including a few extremely significant ones (ribosome, p < 1E-300, photosynthetic membrane, p < 1.3E-137, proteasome complex, p < 5.9E-126). In addition, we are able to predict cis-regulatory elements for 66.7% of the modules, and the association between the enriched cis-regulatory elements and the enriched functional terms can often be confirmed by the literature. Overall, our results are much more significant than those reported by several previous studies on similar data sets. Finally, we utilize the co-expression network to dissect the promoters of 19 Arabidopsis genes involved in the metabolism and signaling of the important plant hormone gibberellin, and achieved promising results that reveal interesting insight into the biosynthesis and signaling of gibberellin.

Conclusions: The results show that our method is highly effective in finding functional modules from real microarray data. Our application on Arabidopsis leads to the discovery of the largest number of annotated Arabidopsis functional modules in the literature. Given the high statistical significance of functional enrichment and the agreement between cis-regulatory and functional annotations, we believe our Arabidopsis gene modules can be used to predict the functions of unknown genes in Arabidopsis, and to understand the regulatory mechanisms of many genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Network properties. (a) Degree distribution of the co-expression network constructed from real data or randomized data. (b) Module size distribution.
Figure 2
Figure 2
Gene co-expression subnetwork of Arabidopsis. Subnetwork contains genes in the top 40 functional modules with the highest statistical significance of enrichment of Gene Ontology terms.
Figure 3
Figure 3
Arabidopsis cis-regulatory network. A circle represents a gene module. A triangle represents a motif. The size of a node is proportional to its module size or the number of modules it regulates.
Figure 4
Figure 4
Cis-regulatory network of Arabidopsis gibberellin metabolism and signaling genes. Yellow and green nodes represent genes and cis-regulatory elements, respectively. The width of an edge is proportional to the significance of enrichment, measured by the negative logarithm of the p-value. The number after the dot following the motif name represents the number of mismatches allowed in order to obtain maximum statistical significance.
Figure 5
Figure 5
Illustration of three co-expression network construction methods.

Similar articles

Cited by

References

    1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. doi: 10.1126/science.270.5235.467. - DOI - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res. 2007;35(Database issue):760–765. http://www.hubmed.org/display.cgi?uids=17099226 - PMC - PubMed
    1. Carter S, Brechbühler C, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004;20:2242–2050. doi: 10.1093/bioinformatics/bth234. - DOI - PubMed
    1. Elo L, Järvenpää H, Oresic M, Lahesmaa R, Aittokallio T. Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics. 2007;23:2096–103. doi: 10.1093/bioinformatics/btm309. - DOI - PubMed

Publication types

LinkOut - more resources