Comparative Study

. 2011 Jul 19:12:292.

doi: 10.1186/1471-2105-12-292.

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Sabrina Hempel¹, Aneta Koseska, Zoran Nikoloski, Jürgen Kurths

Affiliations

Affiliation

¹ Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany. sabrina.donner@pik-potsdam.de

PMID: 21771321
PMCID: PMC3161045
DOI: 10.1186/1471-2105-12-292

Comparative Study

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Sabrina Hempel et al. BMC Bioinformatics. 2011.

. 2011 Jul 19:12:292.

doi: 10.1186/1471-2105-12-292.

Authors

Sabrina Hempel¹, Aneta Koseska, Zoran Nikoloski, Jürgen Kurths

Affiliation

¹ Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany. sabrina.donner@pik-potsdam.de

PMID: 21771321
PMCID: PMC3161045
DOI: 10.1186/1471-2105-12-292

Abstract

Background: Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.

Results: Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.

Conclusions: Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.

PubMed Disclaimer

Figures

**Figure 1**
**Components of the relevance network algorithm for reverse engineering gene regulatory networks (*GRN*)**. The measures are grouped based on the representation on which they operate. Here, the different background colors indicate which combinations of scoring schemes and measures are studied. Altogether, there are 50 combinations included, because some measures can be further sub-divided.

**Figure 2**
**Performance of various similarity measures (noise-free case)**. (a) *ROC* curves obtained for the ID scoring scheme using the simple, conditional and partial Pearson correlation (*μ_P*, , ), where the diagonal of the cross-correlation matrix is set to 0. (b) *ROC* curves using the ID scoring scheme and different correlation coefficient, such as the simple Pearson correlation coefficient, where the diagonal of cross-correlation matrix is once 0 (*μ_P*(*diag*0)), and another time the diagonal is 1 (*μ_P*(*diag*1)). Furthermore, the *ROC* curves using the Spearman (*μ_S*(*diag*1)) and the Kendall (*μ_K*(*diag*1)) correlation coefficient, where the diagonal is 1 in both cases, are shown. (c) Evaluation of the ID scoring scheme using information-theoretic measures: simple, conditional and residual mutual information (*μ_I*, and ). (d) Evaluation of the ID scoring scheme using measures based on symbolic dynamics: symbol sequence similarity (), the mutual information of the symbol sequences () and the mean of these both (), as well as the symbol sequence similarity of pairs of time points ( (*pairs*)) and the conditional entropy of the symbols obtained from the pairs of time points ( (*pairs*)). (e) The corresponding *ROC* curves illustrating the performance of the Time Shift scoring scheme using the Pearson correlation *μ_P*, applied in addition to the *CLR* (measure: *μ_S*) and the *AWE* (measure: ) scoring scheme. (f) Performance of the *AWE* algorithm using the selected symbol based measures included in the this study, for example *ROC* curves for the symbol sequence similarity (), the mutual information of the symbol sequences (), and the mean of these both ().

formula image — **Figure 2**
**Performance of various similarity measures (noise-free case)**. (a) *ROC* curves obtained for the ID scoring scheme using the simple, conditional and partial Pearson correlation (*μ_P*, , ), where the diagonal of the cross-correlation matrix is set to 0. (b) *ROC* curves using the ID scoring scheme and different correlation coefficient, such as the simple Pearson correlation coefficient, where the diagonal of cross-correlation matrix is once 0 (*μ_P*(*diag*0)), and another time the diagonal is 1 (*μ_P*(*diag*1)). Furthermore, the *ROC* curves using the Spearman (*μ_S*(*diag*1)) and the Kendall (*μ_K*(*diag*1)) correlation coefficient, where the diagonal is 1 in both cases, are shown. (c) Evaluation of the ID scoring scheme using information-theoretic measures: simple, conditional and residual mutual information (*μ_I*, and ). (d) Evaluation of the ID scoring scheme using measures based on symbolic dynamics: symbol sequence similarity (), the mutual information of the symbol sequences () and the mean of these both (), as well as the symbol sequence similarity of pairs of time points ( (*pairs*)) and the conditional entropy of the symbols obtained from the pairs of time points ( (*pairs*)). (e) The corresponding *ROC* curves illustrating the performance of the Time Shift scoring scheme using the Pearson correlation *μ_P*, applied in addition to the *CLR* (measure: *μ_S*) and the *AWE* (measure: ) scoring scheme. (f) Performance of the *AWE* algorithm using the selected symbol based measures included in the this study, for example *ROC* curves for the symbol sequence similarity (), the mutual information of the symbol sequences (), and the mean of these both ().

**Figure 3**
**Performance of various similarity measures for noisy data (noise level 0.3)**. The plot shows *ROC* curves of (a) mutual information (*μ_I*), residual mutual information (), symbol sequence similarity (), mutual information of the symbol sequences () and the mean of these two (), and (b) Pearson correlation (*μ_P*), partial Pearson correlation (), conditional Pearson correlation (), Spearman correlation (*μ_S*) and Kendall correlation (*μ_K*).

**Figure 4**
*ROC* curves obtained from the reconstruction of different networks. The results are shown for an *E. coli* network of 100 genes, a *S.cerevisiae* network of 100 gene and an *E. coli* network of 200 genes using various similarity measures: (a) partial Pearson correlation , (b) conditional Granger causality , (c) Spearman correlation *μ_S*, (d) simple mutual information *μ_I*, (e) symbol sequence similarity , and (f) residual mutual information .

**Figure 5**
**Evaluation of the investigated scoring schemes/measures using the three different summary statistics (noise-free case)**. Similar approaches are grouped together. The first group in cyan refers to the different measures applied together with the ID scoring scheme. The green stands for the *CLR* scoring scheme, the orange for the *MRNET*, yellow refers to the *ARACNE*, magenta to the *AWE* and violet stands for the TS. These colors are related to those in Fig. 1. Furthermore, blue groups together all measures applied with a combination of scoring schemes.

**Figure 6**
**Summary statistics considering moderate noise (noise level 0.3)**. The results for selected measures using different scoring schemes are shown. Similar approaches are grouped together here in the same way as in Fig. 5.

**Figure 7**
**Test data set for the comparison study**. (a) The *GRN* of m = 100 genes in *E. coli* is illustrated in the lower right panel as an adjacency matrix. Each entry marks a regulatory link between two associated genes. The upper panel shows the corresponding expression time series (simulated in the noise-free case and normalized to values between 0 (coded in black) and 1 (coded in white)). An example of the time series of the *lon* gene (gene number 2), including a spline interpolation is shown in the lower left panel. (b) The graphical representation of the network is shown in addition.

**Figure 8**
**Illustration of the concept of dynamic time warping (*DTW*)**. The upper panel shows two time series x (black) and y (gray), as well as a mapping (red lines) of the time points in x into those in y. This mapping is optimal with respect to the step pattern "symmetric2", meaning the sum of all incorporated local distances (represented by lengths of the red lines) is minimal, given the constraints from the step pattern. The lower panel shows all local distances between time points in x and y in a contour plot, where the red path is associated with the lowest value of the cumulative distance (optimal alignment path).

**Figure 9**
**Illustration of the concept of order pattern**. The left panels show a time series (black) composed of n = 4 time points and particular groups of 3 time points each which are forming order pattern of dimension δ = 3 (red). The possible order pattern of that dimension are overviewed in the right panel together with the resulting symbol sequence S for the mentioned time series.

See this image and copyright information in PMC

References

1. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424(6945):147–151. doi: 10.1038/nature01763. - DOI - PubMed
1. Stolovitzky G, Monroe D, Califano A. Dialogue on Reverse-Engineering Assessment and Methods: The DREAM of High-throughput pathway inference. Annals of the New York Academy of Sciences. 2007;1115:1–22. doi: 10.1196/annals.1407.021. - DOI - PubMed
1. Albert R, Othmer HG. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. Journal of Theoretical Biology. 2003;223:1–18. doi: 10.1016/S0022-5193(03)00035-3. - DOI - PMC - PubMed
1. Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. PNAS. 2002;99(16):10555–10560. doi: 10.1073/pnas.152046799. - DOI - PMC - PubMed
1. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431(7006):308–312. doi: 10.1038/nature02782. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Affiliation

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources