Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Oct:76:135-157.
doi: 10.1016/j.jare.2024.12.004. Epub 2024 Dec 6.

Unlocking biological insights from differentially expressed genes: Concepts, methods, and future perspectives

Affiliations
Review

Unlocking biological insights from differentially expressed genes: Concepts, methods, and future perspectives

Huachun Yin et al. J Adv Res. 2025 Oct.

Abstract

Background: Identifying differentially expressed genes (DEGs) is a core task of transcriptome analysis, as DEGs can reveal the molecular mechanisms underlying biological processes. However, interpreting the biological significance of large DEG lists is challenging. Currently, gene ontology, pathway enrichment and protein-protein interaction analysis are common strategies employed by biologists. Additionally, emerging analytical strategies/approaches (such as network module analysis, knowledge graph, drug repurposing, cell marker discovery, trajectory analysis, and cell communication analysis) have been proposed. Despite these advances, comprehensive guidelines for systematically and thoroughly mining the biological information within DEGs remain lacking.

Aim of review: This review aims to provide an overview of essential concepts and methodologies for the biological interpretation of DEGs, enhancing the contextual understanding. It also addresses the current limitations and future perspectives of these approaches, highlighting their broad applications in deciphering the molecular mechanism of complex diseases and phenotypes. To assist users in extracting insights from extensive datasets, especially various DEG lists, we developed DEGMiner (https://www.ciblab.net/DEGMiner/), which integrates over 300 easily accessible databases and tools.

Key scientific concepts of review: This review offers strong support and guidance for exploring DEGs, and also will accelerate the discovery of hidden biological insights within genomes.

Keywords: Analytical strategies; Biological interpretation; Biological process; Differentially expressed genes; Molecular mechanisms.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Overview of gene functional annotation databases and enrichment analysis methodologies. A) Some representative pathway databases and gene functional annotation databases. It is noteworthy that 13 of the aforementioned databases are members of the InterPro Consortium, which include CATH-Gene3D, CDD, HAMAP, MobiDB, PANTHER, Pfam, PIRSF, PRINTS, PROSITE, SFLD, SMART, SUPERFAMILY, and NCBIfam. B) Three types of methods for enrichment analysis. The schematic diagram demonstrates that ORA employs a hypergeometric test to assess whether the number of query genes is significantly higher than expected by chance. FCS method ranks the gene set based on gene expression levels and tests if the hit genes map to the annotated gene set. TB method integrates scores that measure genes' connectivity within the expression level and their position within a network.
Fig. 2
Fig. 2
Gene-associated networks and modules schematic. A) Gene regulatory network. B-C) TF-gene and miRNA-gene interaction networks. The green triangle, circle with two colors and blue cylinder stand for the miRNA, TF and mRNA, respectively. D) The ceRNA network. The triangle, circle and rhombus stand for the miRNA, mRNA and lncRNA, respectively. Gene B interacts with miRNA through miRNA response element (MRE) binding sites, leading to the inhibition of its expression by miRNA binding. miRNA can also be regulated by interacting with Gene C (lncRNA), pseudogenes or Gene A, by MRE site interaction with their seed sequence. ceRNA crosstalk is influenced by the expression levels of all RNA molecules participating in the network, behaving as “competitors” for the same miRNA cluster. E) Protein-protein interaction network and the related functional modules. In PPI network, the nodes indicate the proteins, with the size of the node (degree) indicating the number of links to a given node. Different colors represent the different functions of proteins. F) Co-expression modules. G) Co-regulation modules.
Fig. 3
Fig. 3
The analysis and visualization of KGs and potential drug discovery. A) The networks about genes extracted from literature. B) The nodes in graph represent data entities, and the edges represent the relationships between them. The network view depicts original nodes as enriched terms, with node size reflecting the weight of each term. Red nodes indicate up-regulation, green nodes indicate down-regulation, and numbers represent fold changes of DEGs. The size of the letters represents the significance of the P value. C) Gene signature-driven potential drug discovery. Initially, a query signature is prepared by compiling upregulated and downregulated genes associated with a disease state. This signature is then compared to a database of gene expression signatures from known perturbations or disease phenotypes. Compounds that exhibit a similar expression pattern to the disease state (inducing red and suppressing blue) are considered potential side-effect compounds. Conversely, compounds capable of reversing the disease expression pattern (suppressing red and inducing blue) are identified as candidate drugs.
Fig. 4
Fig. 4
Overview of condition-specific analysis based on gene signatures. Gene signature with the ability to serve as the marker with A) spatial or temporal specificity, and usage for conducting B) cell deconvolution in spatial and bulk data, C) inferring trajectory, and D) uncovering cell–cell communication.
Fig. 5
Fig. 5
Illustration of sample label prediction and gene-phenotype association analysis. A) The sample label prediction mainly relies on two strategies, classification prediction and clustering analysis. B) Gene and phenotype association analysis. Based on gene expression level, the interactions between genes and phenotypes (e.g., disease, tissue, cell state, cell or organism morphology) were inferred based on machine learning algorithms. The single-cell genomics provides a means to quantitatively annotate cell states on the basis of high-information content and high-throughput measurements according to the gene expression level. The SNP-gene-phenotype association strategy consists of two kinds: (1) Direct model development. This model is based on SNP-gene-phenotype association using data integration algorithms with gene clusters (sets) and expression levels. SNP clusters (sets) corresponding to the selected gene clusters can be identified by eQTL data. (2) Modeling using reference panel: the TWAS strategy. TWAS consists of three steps: (i) Modeling based on a reference panel to establish the relationship between SNPs and gene expression levels. Samples in the reference panel have genotype and expression level data for fitting the relationship between these SNP loci and corresponding gene expression levels (selecting SNP loci within 500 kb or 1 M range upstream and downstream of the gene). (ii) Using the model in step (1) to predict the gene expression levels for another set of individuals with genotype data. (iii) Analyzing the association between genes and phenotype using predicted gene expression levels. C) The principle of time-to-event (survival) analysis.
Fig. 6
Fig. 6
DEGMiner website and practical guidelines for users. A) The homepage of DEGMiner website (https://www.ciblab.net/DEGminer/). B) Practical guidelines for users.

References

    1. Porcu E., Sadler M.C., Lepik K., Auwerx C., Wood A.R., Weihs A., et al. Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome. Nat Commun. 2021;12(1):5647. doi: 10.1038/s41467-021-25805-y. - DOI - PMC - PubMed
    1. Cheng J., Wei D., Ji Y., Chen L., Yang L., Li G., et al. Integrative analysis of DNA methylation and gene expression reveals hepatocellular carcinoma-specific diagnostic biomarkers. Genome Med. 2018;10(1):1–11. doi: 10.1186/s13073-018-0548-z. - DOI - PMC - PubMed
    1. Yu S., Li Y., Liao Z., Wang Z., Wang Z., Li Y., et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut. 2020;69(3):540–550. doi: 10.1136/gutjnl-2019-318860. - DOI - PubMed
    1. Wang M., Roussos P., McKenzie A., Zhou X., Kajiwara Y., Brennand K.J., et al. Integrative network analysis of nineteen brain regions identifies molecular signatures and networks underlying selective regional vulnerability to Alzheimer's disease. Genome Med. 2016;8(1):1–21. doi: 10.1186/s13073-016-0355-3. - DOI - PMC - PubMed
    1. Guo M., Cui C., Song X., Jia L., Li D., Wang X., et al. Deletion of FGF9 in GABAergic neurons causes epilepsy. Cell Death Dis. 2021;12(2):196. doi: 10.1038/s41419-021-03478-1. - DOI - PMC - PubMed

LinkOut - more resources