Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 15;36(8):2522-2529.
doi: 10.1093/bioinformatics/btz950.

LiPLike: towards gene regulatory network predictions of high certainty

Affiliations

LiPLike: towards gene regulatory network predictions of high certainty

Rasmus Magnusson et al. Bioinformatics. .

Abstract

Motivation: High correlation in expression between regulatory elements is a persistent obstacle for the reverse-engineering of gene regulatory networks. If two potential regulators have matching expression patterns, it becomes challenging to differentiate between them, thus increasing the risk of false positive identifications.

Results: To allow for gene regulation predictions of high confidence, we propose a novel method, the Linear Profile Likelihood (LiPLike), that assumes a regression model and iteratively searches for interactions that cannot be replaced by a linear combination of other predictors. To compare the performance of LiPLike with other available inference methods, we benchmarked LiPLike using three independent datasets from the Dialogue on Reverse Engineering Assessment and Methods 5 (DREAM5) network inference challenge. We found that LiPLike could be used to stratify predictions of other inference tools, and when applied to the predictions of DREAM5 participants, we observed an average improvement in accuracy of >140% compared to individual methods. Furthermore, LiPLike was able to independently predict networks better than all DREAM5 participants when applied to biological data. When predicting the Escherichia coli network, LiPLike had an accuracy of 0.38 for the top-ranked 100 interactions, whereas the corresponding DREAM5 consensus model yielded an accuracy of 0.11.

Availability and implementation: We made LiPLike available to the community as a Python toolbox, available at https://gitlab.com/Gustafsson-lab/liplike. We believe that LiPLike will be used for high confidence predictions in studies where individual model interactions are of high importance, and to remove false positive predictions made by other state-of-the-art gene-gene regulation prediction tools.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of LiPLike rationale. (A) In a toy system of three gene regulators (X1, and the correlated variables X2 and X3) regulating a target gene (Y), the optimal parameters of the corresponding linear model are easily identified using the method of least squares. Next, by imposing a constraint that the parameter value of X1 should equal a value ζ and iteratively re-estimating the remaining two variables, the profile of the residual sum of squares as a function of ζ can be studied. In other words, as the parameter ζ is changed, so will the ability of the regulators to explain the data in Y. Of special interest is the point of ζ = 0, i.e. where the regulator is removed from the system. Furthermore, if X1 is uniquely needed to model Y, the residual sum of squares as a function of ζ will increase rapidly, as seen in the top case, and there will be a large increase in the residual sum of squares between the best fit and the case where ζ = 0. In the bottom case, since X2 and X3 are correlated, there exists a linear combination of remaining explanatory variables that can adequately fit Y, and the residual sum of squares is less dependent on ζ. This is because when the parameter between X2 and Y is changed, the variable X3 is able to take the place of X2. (B) Two examples of LiPLike applied to data. Three independent variables exist, whereof two (X2 and X3) have a high correlation between them. To explain dependent variable Y, either (X1, X2) or (X1, X3) are needed, and there is no way to infer whether X2 or X3 is the correct regulator. If X2 or X3 is left out from the set, LiPLike infers both remaining inputs to be important, as illustrated by the magnitudes of q shown to the right. When all three independent variables are included, LiPLike refrains from selecting variables that cannot be inferred uniquely. This is because there is no way to determine if X2 or X3 is the correct regulator
Fig. 2.
Fig. 2.
LiPLike properties and performance on in silico generated networks. The confidence of inferred edges is listed as q, calculated for two datasets for the same network. The networks differed in the signal-to-noise ratio. The magnitudes of q were found to be dependent on the noise level, with a factor 10e7 differing between the datasets, as seen on the x-axis on the histograms to the left. Moreover, the histograms both display a property empirically arising from LiPLike for networks with strong signals, i.e. a separation of confidence into two distinct groups, high confidence or none. To the right are the corresponding receiver operating characteristic curves, showing that LiPLike infers edges well for some values, to then have a near to random chance for identifying an edge. The networks were retrieved from https://bitbucket.org/sonnhammergrni/genespider
Fig. 3.
Fig. 3.
LiPLike performance on DREAM5 challenge data. (A) Accuracy of algorithms predicting edges of the E.coli network as a function of number of edges considered. The LiPLike performance is plotted in red, showing a higher accuracy than all DREAM5 participants. (B) The accuracies of LiPLike across top-ranked edges for all networks. (C) The accuracy of all methods, the crowd estimate and LiPLike for the top-ranked edges. LiPLike gave the highest accuracy of all methods in both biologically derived networks, and ranked 20th of 36 in the in silico network. (Color version of this figure is available at Bioinformatics online.)
Fig. 4.
Fig. 4.
Accuracy of edge predictions of the DREAM5 community prediction and LiPLike, split up between top edges that are exclusively found in the community, LiPLike, and in both. In all cases, the edges that are found in both predictions have a considerable increase in accuracy compared to the DREAM5 challenge community prediction. Moreover, in the case of the biological networks, S.aureus and E.coli, LiPLike performs better than the community in the non-overlapping predictions, indicating that LiPLike identifies edges that the community failed to include
Fig. 5.
Fig. 5.
Network properties. A) Cumulative distribution of the highest correlation with other regulators of putative interactions shown for LiPLike (red), and the crowd (grey) top-ranked interaction from respective DREAM5 networks. The regulators LiPLike identify tend to have on average fewer correlating regulators. For example, in the E.coli network, we observed a median Pearson correlation of ρ=0.57. For the corresponding community prediction, 85.3% of all inferred regulators have a higher correlation than 0.57 to another regulator. This higher correlation indicates that LiPLike to a lesser degree predicts edges where there are several potential regulators to choose from. (B) Distribution of inferred edges for each transcription factor for LiPLike (red) and the community (grey). While the putative outdegrees of transcription factors in the community estimate appear to follow power law (as indicated by the straight line in the log-scale), LiPLike appears to select edges with a broader distribution profile. (C) Accuracies for the inferred top regulators in the community prediction were found to be low. The top regulators in the LiPLike network had similar accuracies to the overall LiPLike accuracy. (Color version of this figure is available at Bioinformatics online.)

Similar articles

Cited by

References

    1. Aghdam R. et al. (2015) CN: a consensus algorithm for inferring gene regulatory networks using the SORDER algorithm and conditional mutual information test. Mol. Biosyst., 11, 942–949. - PubMed
    1. Alvarez M.J. et al. (2018) A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat. Genet., 50, 979–989. - PMC - PubMed
    1. Arrieta-Ortiz M.L. et al. (2015) An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol. Syst. Biol., 11, 839. - PMC - PubMed
    1. Barzel B., Barabási A.L. (2013) Network link prediction by global silencing of indirect correlations. Nat. Biotechnol., 31, 720–725. - PMC - PubMed
    1. Bonneau R. et al. (2006) The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol., 7, R36. - PMC - PubMed

Publication types