Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Jul 19:12:292.
doi: 10.1186/1471-2105-12-292.

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Affiliations
Comparative Study

Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study

Sabrina Hempel et al. BMC Bioinformatics. .

Abstract

Background: Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications.

Results: Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study.

Conclusions: Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Components of the relevance network algorithm for reverse engineering gene regulatory networks (GRN). The measures are grouped based on the representation on which they operate. Here, the different background colors indicate which combinations of scoring schemes and measures are studied. Altogether, there are 50 combinations included, because some measures can be further sub-divided.
Figure 2
Figure 2
Performance of various similarity measures (noise-free case). (a) ROC curves obtained for the ID scoring scheme using the simple, conditional and partial Pearson correlation (μP, formula image, formula image), where the diagonal of the cross-correlation matrix is set to 0. (b) ROC curves using the ID scoring scheme and different correlation coefficient, such as the simple Pearson correlation coefficient, where the diagonal of cross-correlation matrix is once 0 (μP (diag0)), and another time the diagonal is 1 (μP (diag1)). Furthermore, the ROC curves using the Spearman (μS (diag1)) and the Kendall (μK (diag1)) correlation coefficient, where the diagonal is 1 in both cases, are shown. (c) Evaluation of the ID scoring scheme using information-theoretic measures: simple, conditional and residual mutual information (μI, formula image and formula image). (d) Evaluation of the ID scoring scheme using measures based on symbolic dynamics: symbol sequence similarity (formula image), the mutual information of the symbol sequences (formula image) and the mean of these both (formula image), as well as the symbol sequence similarity of pairs of time points (formula image (pairs)) and the conditional entropy of the symbols obtained from the pairs of time points (formula image (pairs)). (e) The corresponding ROC curves illustrating the performance of the Time Shift scoring scheme using the Pearson correlation μP, applied in addition to the CLR (measure: μS) and the AWE (measure: formula image) scoring scheme. (f) Performance of the AWE algorithm using the selected symbol based measures included in the this study, for example ROC curves for the symbol sequence similarity (formula image), the mutual information of the symbol sequences (formula image), and the mean of these both (formula image).
Figure 3
Figure 3
Performance of various similarity measures for noisy data (noise level 0.3). The plot shows ROC curves of (a) mutual information (μI), residual mutual information (formula image), symbol sequence similarity (formula image), mutual information of the symbol sequences (formula image) and the mean of these two (formula image), and (b) Pearson correlation (μP), partial Pearson correlation (formula image), conditional Pearson correlation (formula image), Spearman correlation (μS) and Kendall correlation (μK).
Figure 4
Figure 4
ROC curves obtained from the reconstruction of different networks. The results are shown for an E. coli network of 100 genes, a S.cerevisiae network of 100 gene and an E. coli network of 200 genes using various similarity measures: (a) partial Pearson correlation formula image, (b) conditional Granger causality formula image, (c) Spearman correlation μS, (d) simple mutual information μI, (e) symbol sequence similarity formula image, and (f) residual mutual information formula image.
Figure 5
Figure 5
Evaluation of the investigated scoring schemes/measures using the three different summary statistics (noise-free case). Similar approaches are grouped together. The first group in cyan refers to the different measures applied together with the ID scoring scheme. The green stands for the CLR scoring scheme, the orange for the MRNET, yellow refers to the ARACNE, magenta to the AWE and violet stands for the TS. These colors are related to those in Fig. 1. Furthermore, blue groups together all measures applied with a combination of scoring schemes.
Figure 6
Figure 6
Summary statistics considering moderate noise (noise level 0.3). The results for selected measures using different scoring schemes are shown. Similar approaches are grouped together here in the same way as in Fig. 5.
Figure 7
Figure 7
Test data set for the comparison study. (a) The GRN of m = 100 genes in E. coli is illustrated in the lower right panel as an adjacency matrix. Each entry marks a regulatory link between two associated genes. The upper panel shows the corresponding expression time series (simulated in the noise-free case and normalized to values between 0 (coded in black) and 1 (coded in white)). An example of the time series of the lon gene (gene number 2), including a spline interpolation is shown in the lower left panel. (b) The graphical representation of the network is shown in addition.
Figure 8
Figure 8
Illustration of the concept of dynamic time warping (DTW). The upper panel shows two time series x (black) and y (gray), as well as a mapping (red lines) of the time points in x into those in y. This mapping is optimal with respect to the step pattern "symmetric2", meaning the sum of all incorporated local distances (represented by lengths of the red lines) is minimal, given the constraints from the step pattern. The lower panel shows all local distances between time points in x and y in a contour plot, where the red path is associated with the lowest value of the cumulative distance (optimal alignment path).
Figure 9
Figure 9
Illustration of the concept of order pattern. The left panels show a time series (black) composed of n = 4 time points and particular groups of 3 time points each which are forming order pattern of dimension δ = 3 (red). The possible order pattern of that dimension are overviewed in the right panel together with the resulting symbol sequence S for the mentioned time series.

Similar articles

Cited by

References

    1. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424(6945):147–151. doi: 10.1038/nature01763. - DOI - PubMed
    1. Stolovitzky G, Monroe D, Califano A. Dialogue on Reverse-Engineering Assessment and Methods: The DREAM of High-throughput pathway inference. Annals of the New York Academy of Sciences. 2007;1115:1–22. doi: 10.1196/annals.1407.021. - DOI - PubMed
    1. Albert R, Othmer HG. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. Journal of Theoretical Biology. 2003;223:1–18. doi: 10.1016/S0022-5193(03)00035-3. - DOI - PMC - PubMed
    1. Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. PNAS. 2002;99(16):10555–10560. doi: 10.1073/pnas.152046799. - DOI - PMC - PubMed
    1. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431(7006):308–312. doi: 10.1038/nature02782. - DOI - PubMed

Publication types

LinkOut - more resources