Estimating Sample-Specific Regulatory Networks

Marieke Lydia Kuijjer¹, Matthew George Tung², GuoCheng Yuan³, John Quackenbush⁴, Kimberly Glass⁵

Affiliations

¹ Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway.
² Department of Anesthesiology, Critical Care, and Pain Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
³ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
⁴ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
⁵ Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA. Electronic address: kimberly.glass@channing.harvard.edu.

PMID: 30981959
PMCID: PMC6463816
DOI: 10.1016/j.isci.2019.03.021

Estimating Sample-Specific Regulatory Networks

Marieke Lydia Kuijjer et al. iScience. 2019.

. 2019 Apr 26:14:226-240.

doi: 10.1016/j.isci.2019.03.021. Epub 2019 Mar 28.

Authors

Marieke Lydia Kuijjer¹, Matthew George Tung², GuoCheng Yuan³, John Quackenbush⁴, Kimberly Glass⁵

Affiliations

¹ Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway.
² Department of Anesthesiology, Critical Care, and Pain Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
³ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
⁴ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
⁵ Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA. Electronic address: kimberly.glass@channing.harvard.edu.

PMID: 30981959
PMCID: PMC6463816
DOI: 10.1016/j.isci.2019.03.021

Abstract

Biological systems are driven by intricate interactions among molecules. Many methods have been developed that draw on large numbers of expression samples to infer connections between genes (or their products). The result is an aggregate network representing a single estimate for the likelihood of each interaction, or "edge," in the network. Although informative, aggregate models fail to capture population heterogeneity. Here we propose a method to reverse engineer sample-specific networks from aggregate networks. We demonstrate our approach in several contexts, including simulated, yeast microarray, and human lymphoblastoid cell line RNA sequencing data. We use these sample-specific networks to study changes in network topology across time and to characterize shifts in gene regulation that were not apparent in the expression data. We believe that generating sample-specific networks will greatly facilitate the application of network methods to large, complex, and heterogeneous multi-omic datasets, supporting the emerging field of precision network medicine.

Keywords: Bioinformatics; Biological Sciences; Complex Systems.

PubMed Disclaimer

Figures

**Figure 1**
Overview of LIONESS Approach and Evaluation (A) Flow diagram summarizing the analyses performed in this article to evaluate the LIONESS approach. LIONESS was applied to multiple aggregate network reconstruction approaches including Pearson correlation coefficient, PANDA (Passing Attributes between Networks for Data Assimilation), MI (mutual information), and CLR (Context Likelihood of Relatedness). (B) Visual illustration of how LIONESS estimates the network for a single sample based on two aggregate network models, one reconstructed using all biological samples in a given dataset and the other using all except the sample of interest (q). See also Table S2 and Figure S1.

**Figure 2**
Evaluation of LIONESS′ Ability to Recover Known Single-Sample Networks in *In Silico* Data (A) Toy example of how we create a single-sample network from an underlying baseline network. (B) Illustration of the gene expression samples used to build a single-sample network. We evaluated the accuracy of both the aggregate network derived using all samples (red) and the LIONESS-estimated single-sample network (black) by benchmarking against the corresponding “gold-standard” single-sample network. (C) The mean and standard deviation of the AUC values of the aggregate (red) and LIONESS-predicted single-sample networks (black) estimated from *in silico* datasets representing varying levels of heterogeneity. (D) The mean and standard deviation of the AUC values of the aggregate (red) and LIONESS-predicted single-sample networks (black) estimated using increasing numbers of input expression samples. For each sample size, 10,000 random subsets of samples were used. (E) Violin plots showing the distribution of AUC values for aggregate and LIONESS-predicted single-sample networks estimated using four different aggregate network reconstruction approaches. For (C–E) AUCs were calculated using all possible edges, and for edges that differ from the baseline model (permuted edges), see (A). See also Figures S2–S4.

**Figure 3**
Analysis of LIONESS Networks Predicted for 48 Expression Samples Collected across a Yeast Cell Cycle Time Course Experiment LIONESS was used to predict networks for each sample in the expression data by applying four different aggregate network reconstruction approaches. For each approach we built the aggregate models either using all samples (R1&R2 from R1&R2), or only the samples from the same technical replicate (R1-from-R1 & R2-from-R2). The Spearman correlation was used to evaluate how similar these networks are to each other. See also Figure S5.

**Figure 4**
Characterizing Networks across the Yeast Cell Cycle (A) A heatmap of the edge weights for the 1,000 most variable edges across the sample-specific network models. The left panel shows the weights of these edges in the aggregate network, and the right panel shows the edge weights across the single-sample networks. For the right panel rows are Z score normalized for visualization purposes. (B) The average expression of genes targeted by the four transcription factors that were identified as regulatory nodes of the 1,000 topmost variable edges as well as the average weight of high-confidence edges that extend between those transcription factors and their target genes. The average weight of these edges in the aggregate network is shown as a dashed line. See also Figure S5.

**Figure 5**
Comparison of Gene Regulation, Gene Expression, and DNase Hypersensitivity Data (A) For six representative transcription factors, the mean expression of target genes and the mean weight of the edges targeting those genes across the 65 samples are plotted. For each sample, the expression of the TF is shown as a color, scaled to the normal distribution for visualization purposes. (B) A cartoon illustrating how high edge weights and thus regulatory activity is not necessarily equivalent to the presence of a physical interaction. (C) The distribution of the Spearman correlation values when comparing gene targeting (calculated by combining LIONESS predictions with TF expression; k^(L+e)) and the significance level of DNase hypersensitivity in a gene's promoter across all the samples. We also show the percentage of genes whose targeting positively correlates with DNase hypersensitivity when targeting is calculated using only the LIONESS-predicted edge weight (k^(L): no expression considered) or a combination of expression and motif information (k^(m+e)). We performed these analyses either using all 12,424 genes included in our network model or for the set of 3,488 genes with a DNase peak called in all 65 samples (open genes). See also Figure S6.

**Figure 6**
Analysis of Human Lymphoblastoid Cell Line Networks (A) A hierarchical clustering on the edge weights for 65 regulatory networks, one for each distinct subject-derived lymphoblastoid cell line included in the RNA-seq dataset. (B) An equivalent hierarchical clustering on gene expression values for these 65 individuals. This clustering is distinct from the one based on the network edge weights. Subject labels are colored based on the network clustering. (C) Reactome pathways enriched based on GSEA using gene targeting instead of gene expression and comparing samples from the right and left groups of networks presented in (A). No Reactome pathways were identified when comparing the expression values of genes in the different groups defined by the hierarchical clustering presented in (B). See also Figure S6.

See this image and copyright information in PMC

References

1. Alvarez M.J., Shen Y., Giorgi F.M., Lachmann A., Ding B.B., Ye B.H., Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 2016;48:838–847. - PMC - PubMed
1. Cahir McFarland E.D., Izumi K.M., Mosialos G. Epstein-barr virus transformation: involvement of latent membrane protein 1-mediated activation of NF-kappaB. Oncogene. 1999;18:6959–6964. - PubMed
1. Choy E., Yelensky R., Bonakdar S., Plenge R.M., Saxena R., de Jager P.L., Shaw S.Y., Wolfish C.S., Slavik J.M. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 2008;4:e1000287. - PMC - PubMed
1. Degner J.F., Pai A.A., Pique-Regi R., Veyrieras J.B., Gaffney D.J., Pickrell J.K., de Leon S., Michelini K., Lewellen N., Crawford G.E. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature. 2012;482:390–394. - PMC - PubMed
1. DeLuca D.S., Levin J.Z., Sivachenko A., Fennell T., Nazaire M.D., Williams C., Reich M., Winckler W., Getz G. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics. 2012;28:1530–1532. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating Sample-Specific Regulatory Networks

Affiliations

Estimating Sample-Specific Regulatory Networks

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases