Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 22:6:145.
doi: 10.1186/1752-0509-6-145.

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

Affiliations

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

Anne-Claire Haury et al. BMC Syst Biol. .

Abstract

Background: Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy.

Results: In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regression-based method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference, in both directed and undirected settings.

Conclusions: TIGRESS reaches state-of-the-art performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress. Moreover, TIGRESS can be run online through the GenePattern platform (GP-DREAM, http://dream.broadinstitute.org).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Stability selection. Illustration of the stability selection frequency F(g,t,L) for a fixed target gene g. Each curve represents a TF tTg, and the horizontal axis represents the number L of LARS steps. F(g,t,L) is the frequency with which t is selected in the first L LARS steps to predict g, when the expression matrix is randomly perturbed by selecting only a limited number of experiments and randomly weighting each expression array. For example, the TF corresponding to the highest curve was selected 57% of the time at the first LARS step, and 81% of the time in the first two LARS steps.
Figure 2
Figure 2
Score for network 1. Top plots show the score for R=4,000 and bottom plots depict the case R=10,000 for both the area (left) and the original (right) scoring settings, as a function of α and L.
Figure 3
Figure 3
Optimal values of the parameters. Optimal values of parameters L, α and N with respect to the number of resampling runs.
Figure 4
Figure 4
Impact of the number of resampling runs. Score as a function of R. In both scoring settings, α and L were set to 0.4 and 2, respectively.
Figure 5
Figure 5
Distribution of the number of TFs selected per gene for L=2. Histograms of the number of TFs selected per gene with respect to the total number of predictions when L=2.
Figure 6
Figure 6
Distribution of the number of TFs selected per gene for L=20. Histograms of the number of TFs selected per gene with respect to the total number of predictions when L=20.
Figure 7
Figure 7
Performance on network 1. ROC (left) and Precision/Recall (right) curves for several methods on Network 1.
Figure 8
Figure 8
Performance on DREAM5 network 3. ROC (Left) and Precision/Recall (Right) curves for several methods on DREAM5 network 3.
Figure 9
Figure 9
Performance on DREAM5 network 4. ROC (Left) and Precision/Recall (Right) curves for several methods on DREAM5 network 4.
Figure 10
Figure 10
In vivo networks results. Score with respect to L for DREAM5 networks 3 and 4 and E. coli network (α=0.4, R=10,000).
Figure 11
Figure 11
Performance on the E. coli network. ROC (Left) and Precision/Recall (Right) curves for several methods on the E. coli dataset.
Figure 12
Figure 12
Spurious edges shortest path distribution. Exact distribution of the shortest path between spuriously predicted TF-TG couples.
Figure 13
Figure 13
Distribution of the shortest path with respect to the number of predictions. Distribution of the shortest path length between nodes of spuriously detected edges and 95% confidence interval for the null distribution. These edges are ranked by order of discovery.
Figure 14
Figure 14
Distance-2 patterns. The three possible distance-2 patterns: siblings, couple and grandparent/grandchild relationships.
Figure 15
Figure 15
Distribution of distance-2 errors. Distribution of distance 2 errors with respect to the number of predictions. 95% error bars were computed using the quantiles of a hypergeometric distribution.
Figure 16
Figure 16
Results on DREAM4 networks. Overall score on the five multifactorial size 100 DREAM4 networks, as a function ofα and L.

Similar articles

Cited by

References

    1. Arkin A, Shen P, Ross J. A test case of correlation metric construction of a reaction pathway from measurements. Science. 1997;277(5330):1275–1279. doi: 10.1126/science.277.5330.1275. [ http://www.sciencemag.org/cgi/reprint/277/5330/1275.pdf] - DOI
    1. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998;3:18–29. - PubMed
    1. Chen T, He HL, Church GM. Modeling gene expression with differential equations. Pac Symp Biocomput. 1999;4:29–40. - PubMed
    1. Akutsu T, Miyano S, Kuhara S. Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol. 2000;7(3-4):331–343. doi: 10.1089/106652700750050817. - DOI - PubMed
    1. Yeung MKS, Tegnér J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA. 2002;99(9):6163–6168. doi: 10.1073/pnas.092576199. [ http://www.pnas.org/content/99/9/6163.abstract] - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources