Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 1;10(6):e0128115.
doi: 10.1371/journal.pone.0128115. eCollection 2015.

A null model for Pearson coexpression networks

Affiliations

A null model for Pearson coexpression networks

Andrea Gobbi et al. PLoS One. .

Abstract

Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent simple but effective structures for discovering and interpreting linear gene relationships. In recent years, several approaches have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is most crucial when the number of samples is small, yielding a non-negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The threshold is theoretically derived by means of an analytic approach and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data. We show, on synthetic and array datasets, that the proposed threshold is effective in eliminating all false positive links, with an offsetting cost in terms of false negative detected edges.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. HCC dataset: PCC versus sample size for the coexpression of hsa.mir.016a.chr13 with hsa.mir.016b.chr3 (blue line) and with hsa.mir.010b.precNo1 (red line).
When the correlation is computed on the 10-sample set S the correlation between the two uncoexpressed probes is even higher than the correlation between the two probes sharing an almost identical alignment. When more samples are randomly added, the blue line slightly increases, while the red line quickly drops to the final value 0.536 on the whole dataset H. Both curves are averaged over 1000 randomisations of the added samples, keeping the sub-cohort S constant.
Fig 2
Fig 2. Contour plot of the function p(n,m) in the Samples × Genes space on (a) a wide (n, m) range and (b) zoomed on the small sample size area.
Fig 3
Fig 3. Comparison of the threshold functions p(n,m) (yellow-red gradient) and B 5, t (blue gradient) in the Samples × Genes space; darker colors correspond to larger threshold values.
The relation p<B5,t consistently holds.
Fig 4
Fig 4. Synthetic dataset : level plot of the structure of the correlation matrix (a) and heatmap of the dataset (b).
The generating gene expression vectors G11000 and G111000 are marked with *.
Fig 5
Fig 5. Synthetic dataset .
Coexpression inference of the formula image network from random subsampling of the formula image dataset, without noise (a,b,c), with 20% Gaussian noise (d,e,f) and with 40% Gaussian noise (g,h,i), on 10 (a,d,g), 20 (b,e,h) and 50 (c,f,i) samples. Solid lines indicate the mean over 500 replicated instances of HIM distance (black), ratio of False Positive (blue) and ratio of False Negative (red); dotted lines of the same color indicate confidence bars (+/-σ), while grey vertical dashed lines correspond to the secure threshold p.
Fig 6
Fig 6. Ovarian cancer dataset .
Level plot of the structure of the correlation matrix O T (a) and heatmap of the Ovarian dataset formula image restricted to the set of 20 selected genes T (b). Solid lines separate the group of good and poor PFS/OS top genes.
Fig 7
Fig 7. Ovarian cancer dataset .
Coexpression inference of the coexpression network from subsampling of formula image, on 5 (a), 10 (b), 20 (c) and 50 (d) samples. Solid lines indicate the mean over 500 replicated instances of HIM distance (black), ratio of False Positive (blue) and ratio of False Negative (red); dotted lines of the same color indicate confidence bars (+/-σ), while grey vertical dashed lines correspond to the secure threshold p.

References

    1. Barabási AL. The network takeover. Nature Physics. 2012;8:14–16.
    1. Szederkenyi G, Banga J, Alonso A. Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC Systems Biology. 2011;5(1):177 10.1186/1752-0509-5-177 - DOI - PMC - PubMed
    1. He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: Principles, assumptions, and limitations of present methods and future perspectives. Journal of Biotechnology. 2009;144:190–203. 10.1016/j.jbiotec.2009.07.013 - DOI - PubMed
    1. Meyer P, Alexopoulos LG, Bonk T, Califano A, Cho CR, de la Fuente A, et al. Verification of systems biology research in the age of collaborative competition. Nature Biotechnology. 2011;29(9):811–815. 10.1038/nbt.1968 - DOI - PubMed
    1. Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, et al. Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges. PLoS ONE. 2010. 02;5(2):e9202 10.1371/journal.pone.0009202 - DOI - PMC - PubMed

Publication types