Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 28;5(9):e12776.
doi: 10.1371/journal.pone.0012776.

Inferring regulatory networks from expression data using tree-based methods

Affiliations

Inferring regulatory networks from expression data using tree-based methods

Vân Anh Huynh-Thu et al. PLoS One. .

Abstract

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. GENIE3 procedure.
For each gene formula image, a learning sample formula image is generated with expression levels of j as output values and expression levels of all other genes as input values. A function formula image is learned from formula image and a local ranking of all genes except j is computed. The p local rankings are then aggregated to get a global ranking of all regulatory links.
Figure 2
Figure 2. Detailed results on DREAM4 NET3.
Ranking of the regulators for all genes. Each row corresponds to a gene. Dots in each row represent the positions in the Random Forests ranking of the regulators of this gene. Genes are ordered on the y-axis according to their number of regulators in the gold standard network; those having the same number of regulators are grouped inside an horizontal block (from no regulator at the top to 6 regulators at the bottom). Inside each block, genes are ordered according to the median rank of their regulators. The ranking of interactions was obtained with Random Forests and formula image.
Figure 3
Figure 3. Rank of regulators as a function of the in-degree of the target.
The in-degree of a target is its number of regulators. The dot corresponding to in-degree n is the median rank of regulators that regulate a gene with in-degree n, over the five networks. The rank is presented here as a percentage, such that the first and last regulators of each ranking have a rank equal to 100% and 0% respectively. The ranking of interactions was obtained with Random Forests and formula image.
Figure 4
Figure 4. Precision-Recall curves for the E. coli network.
Only known transcription factors were used as input genes. A. Comparison between the four different settings of the tree procedure. B. Comparison to other approaches.

References

    1. Bolouri H. Computational Modeling of Gene Regulatory Networks - a Primer. Imperial College Press; 2008.
    1. Gardner TS, Faith JJ. Reverse-engineering transcription control networks. Physics of Life Reviews. 2005;2:65–88. - PubMed
    1. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Mol Syst Biol. 2007;3:78. - PMC - PubMed
    1. Markowetz F, Spang R. Inferring cellular networks–a review. BMC Bioinformatics. 2007;8(Suppl 6):S5. - PMC - PubMed
    1. Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data. Brief Bioinform. 2009;10:408–423. - PubMed

Publication types

MeSH terms