Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 15;38(22):5049-5054.
doi: 10.1093/bioinformatics/btac657.

GeneNetTools: tests for Gaussian graphical models with shrinkage

Affiliations

GeneNetTools: tests for Gaussian graphical models with shrinkage

Victor Bernal et al. Bioinformatics. .

Abstract

Motivation: Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues.

Results: We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus.

Availability and implementation: The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
AUROC and AUPRC. (a and b) AUROCs and AUPRCs of the z-score [Equation (10)]. The performance increases with the sample size. (c) AUROCs of the z-score minus the AUROCs of DiffNetFDR. (d) AUPRCs of the z-score minus the AUPRCs of DiffNetFDR. Positive values show a better trade-off of true and false positives for the proposed z-statistics. Data were simulated from networks with p =100 nodes and sample sizes n1 and n2 between 30 and 100
Fig. 2.
Fig. 2.
Escherichia coli microarray network analysis. (a) Bland–Altman plot between the P-values obtained from the t-test [Equation (6)] and the ‘shrunk’ probability density [Equation (4.2)]. The methods are equivalent as the differences are in the order 10−7. (b) Forest plot of partial correlations. The 15 strongest edges are displayed with their 95% confidence intervals. The vertical lines show the 0.1 and 0.3 thresholds for weak and mild correlations (Cohen, 1988)
Fig. 3.
Fig. 3.
Mus musculus RNA-seq differential network analysis. Bland–Altman plot between the partial correlations for strains B6 and D2. The figure shows only the significantly different partial correlations at the 0.05 level (i.e. |z-score|> 1.96), and the names of the nine strongest gene pairs

References

    1. Barabási A.-L. et al. (2011) An integrative systems medicine approach to mapping human metabolic diseases. Nat. Rev. Genet., 12, 56–68. - PMC - PubMed
    1. Beerenwinkel N. et al. (2007) Genetic progression and the waiting time to cancer. PLoS Comput. Biol., 3, e225. - PMC - PubMed
    1. Benedetti E. et al. (2017) Network inference from glycoproteomics data reveals new reactions in the IgG glycosylation pathway. Nat. Commun., 8, 1–15. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc., 57, 289–300.
    1. Bernal V. et al. (2019) Exact hypothesis testing for shrinkage-based Gaussian graphical models. Bioinformatics, 35, 5011–5017. - PMC - PubMed

Publication types