Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 15;23(1):231.
doi: 10.1186/s12859-022-04765-0.

Elucidating gene expression patterns across multiple biological contexts through a large-scale investigation of transcriptomic datasets

Affiliations

Elucidating gene expression patterns across multiple biological contexts through a large-scale investigation of transcriptomic datasets

Rebeca Queiroz Figueiredo et al. BMC Bioinformatics. .

Abstract

Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.

Keywords: Biological context; Co-expression networks; Gene expression; Network biology; Transcriptomic.

PubMed Disclaimer

Conflict of interest statement

DDF received salary from Enveda Biosciences. Other authors do not declare any competing interests.

Figures

Fig. 1
Fig. 1
Conceptualization of the presented study. A Over 600 context-specific transcriptomic datasets are collected and classified into 98 subcontexts (e.g., heart, astrocyte, and HeLa cell) under 3 major contexts (i.e., tissues, cell types, and cell lines), leveraging the Gemma database [21, 48] B Co-expression networks comprising the most strongly correlated edges observed in each subcontext are generated. C Network analyses provide insights on both common and unique patterns across the multiple contexts studied
Fig. 2
Fig. 2
Overview of analyses conducted across all subcontexts in three different contexts (i.e., tissues, cell lines, and cell types). At the protein-level, patterns surrounding each single node are investigated ("Analyses at the protein-level" section). The network-level analysis focuses on the relations between nodes (or node pairs) ("Analyses at the network-level" section) and the pathway-level analysis leverages defined node and edge sets to gain insights on context-specific co-expression networks ("Mapping co-expression networks to pathway knowledge" section)
Fig. 3
Fig. 3
Distribution of network size for each of the three contexts. Distributions of network size are given as the number of nodes in each subcontext. In the tissue context, the cortex of cerebral lobe network had the fewest number of nodes (i.e., 6514), while the placenta network had the largest number of nodes (i.e., 20,171) across not only all networks of the tissue context, but also across all other contexts. In the cell type context, the fibroblast network had the least number of nodes (i.e., 7767), while the stem cell network had the highest number of nodes (i.e., 20,158). In the cell line context, the HepG2 cell line network had the least number of nodes (i.e., 6460), while the Huv-ec-c cell line network had the largest number of nodes (i.e., 18,758). Generally, the networks within each context tended to vary greatly in size. For example, the tissue context includes networks ranging in size from 6514 to 20,171 nodes
Fig. 4
Fig. 4
Frequency of edge occurrence across networks within a context. Proportions of edges are given as those that are unique, or common to varying degrees, in networks within the A tissue, B cell type, and C cell line context. From the total set of edges that occur across all networks within each context, the fraction of edges that are unique (i.e., appear in at most one network within a given context) are shown in green. From this total set of edges, the fraction of those which appear in at least 25% of networks within a given context are magnified in a consecutively smaller pie chart (i.e., predominantly in red). Similarly, those which appear in at least 50% of networks within a given context are magnified and illustrated in a pie chart predominantly in blue. Finally, of this latter group of edges, the fraction of edges that are most common (i.e., appear in at least 75% of all networks within a given context) are highlighted in purple
Fig. 5
Fig. 5
Pairwise co-expression network similarity across contexts. For each pair of co-expression networks within a given context, edge overlap was calculated as a measure of similarity between networks for the A tissue, B cell line, and C cell type contexts. A high quality version of the figure is available at https://github.com/ContNeXt/scripts/blob/main/figures/figure5.pdf
Fig. 6
Fig. 6
Similarity between tissue-specific co-expression networks and KEGG pathways. The similarity between a particular pathway and a co-expression network is defined as the percentage of pairwise combinations of proteins of a given KEGG pathway that can be found in a co-expression network as edges. Light blue corresponds to a lower similarity, while dark blue corresponds to a high similarity. A high quality version of this figure is available at https://github.com/ContNeXt/scripts/blob/main/figures/figure6_highquality.pdf and can also be visualized in the web application
Fig. 7
Fig. 7
ContNeXt web application. A Main page. Users can query for specific genes or directly explore the networks of a given context. B Network page. Users can explore and navigate through the neighbors of a specific gene for each network. C Heatmap visualization. Heatmaps presented in this work can be interactively viewed to investigate pairwise co-expression network -based similarity as well as pathway- co-expression network -based similarity

Similar articles

Cited by

References

    1. Azevedo T, Dimitri GM, Lió P, Gamazon ER. Multilayer modelling of the human transcriptome and biological mechanisms of complex diseases and traits. NPJ Sys Biol Appl. 2021;7(1):1–13. doi: 10.1038/s41540-021-00186-6. - DOI - PMC - PubMed
    1. Cassandri M, Smirnov A, Novelli F, Pitolli C, Agostini M, Malewicz M, et al. Zinc-finger proteins in health and disease. Cell Death Discov. 2017;3(1):1–12. doi: 10.1038/cddiscovery.2017.71. - DOI - PMC - PubMed
    1. Crow M, Lim N, Ballouz S, Pavlidis P, Gillis J. Predictability of human differential gene expression. Proc Natl Acad Sci. 2019;116(13):6491–6500. doi: 10.1073/pnas.1802973116. - DOI - PMC - PubMed
    1. Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, et al. The cell ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semant. 2016;7(1):1–10. doi: 10.1186/s13326-016-0088-7. - DOI - PMC - PubMed
    1. Dobrin R, Zhu J, Molony C, Argman C, Parrish ML, Carlson S, Allan MF, Pomp D, Schadt EE. Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol. 2009;10(5):1–3. doi: 10.1186/gb-2009-10-5-r55. - DOI - PMC - PubMed

LinkOut - more resources