Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;18(1):108.
doi: 10.1186/s12920-025-02174-9.

Identification of potential biomarkers and mechanisms for keloid disorder based on comprehensive bioinformatics analysis and machine learning algorithms

Affiliations

Identification of potential biomarkers and mechanisms for keloid disorder based on comprehensive bioinformatics analysis and machine learning algorithms

Bowen Zheng et al. BMC Med Genomics. .

Abstract

Background: Keloid disorder (KD) encompasses a spectrum of fibroproliferative dermal conditions, the pathogenesis remains complex and incompletely understood. This study sought to identify biomarkers and potential therapeutic targets for KD through an integrative bioinformatics approach and machine learning analysis of RNA sequencing data.

Methods: RNA sequencing was performed on skin tissue samples from 13 patients with KD and 14 healthy controls. Using weighted gene co-expression network analysis and differential expression analysis revealed differentially expressed key module genes, and the CytoHubba plugin identified candidate genes. Subsequently analyzed using least absolute shrinkage and selection operator (LASSO) and support vector machine recursive feature elimination (SVM-RFE) methods to pinpoint feature genes associated with KD. Following this, biomarkers were determined through expression level validation, enrichment analysis, and immune infiltration analysis.

Results: A total of 420 differentially expressed key module genes were identified, and the top 10 genes with DMNC values were selected as candidate genes. Five feature genes were selected through LASSO and SVM-RFE, with NID2, MFAP2, COL8A1, and P4HA3 showing significant expression differences between KD and control samples, along with consistent expression patterns across datasets, identified as potential biomarkers. These four biomarkers were proved to possess high diagnostic potential, and they were found to exhibit significant positive correlations with one another. Functional enrichment analysis indicated that the primary KEGG pathways associated with these biomarkers included "steroid hormone biosynthesis" and "cytokine-cytokine receptor interaction." Moreover, immune infiltration analysis revealed that the four biomarkers were negatively correlated with type 17 T helper cells and positively correlated with 15 immune cell types, including activated B cells and central memory CD4 T cells.

Conclusion: In conclusion, NID2, MFAP2, COL8A1, and P4HA3 were identified as key biomarkers for KD, offering new avenues for more targeted and effective diagnostic and therapeutic strategies for managing this condition.

Keywords: Bioinformatics; Biomarker; Keloid disorder; Molecular mechanisms.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The studies involving human participants were reviewed and approved by the Ethic Committee of Lanzhou University Second Hospital. Informed consent to participate was obtained from all of the participants/ from the parents or legal guardians of the participants in the study. The patients/participants provided their written informed consent to participate in this study. All the experiments of this study were conducted in accordance to the relevant guidelines and regulations or in accordance to the Declaration of Helsinki and obtained informed consent from all participants. Consent for publication: Not Applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Identification of key module genes strongly associated with KD via WGCNA. (a) Sample clustering of 27 total samples to detect outliers. The vertical axis represents the distance between samples. (b) Graphs of scale independence, mean connectivity, and scale-free topology; the optimal soft power was 24. (c) Cluster dendrogram of the coexpression network modules (1-TOM); branches are distinguished by color. (d) Module-trait heatmap of correlations. Red indicates a positive correlation, while blue indicates a negative correlation, with rows representing modules and columns representing trait attributes
Fig. 2
Fig. 2
Identification of differentially expressed key module genes and their enrichment analyses. (a) Heatmap depicting DEGs identified through color gradients, with red indicating high expression and green indicating low expression. Darker colors represent larger differences. (b) Volcano plot showing the distribution of DEGs, with green and red dots representing low and high expression, respectively. (c) Venn diagram illustrating the intersecting genes between the DEGs and key module genes. (d) Circos plots of GO enrichment. The inner circle displays a bar chart where the height of each bar represents the significance of the pathway, with taller bars indicating greater significance. The color of the bars corresponds to the z-score, with darker colors indicating higher z-scores. The outer circle shows a scatter plot representing the expression levels of genes in each pathway, where red and blue dots indicate upregulated and downregulated genes, respectively. The right half of the plot provides descriptions of the enriched GO pathways. (e) KEGG pathway analysis of the key differentially expressed module genes. The circle plot can be divided into two parts. The left half displays the enriched gene names, with darker colors indicating a larger logFC, where red represents upregulated genes and green represents downregulated genes. The right half shows the enriched functional pathways, with different colors representing different pathways, and the larger the color block, the more enriched genes are found in that pathway. Lines in the center connect the genes enriched in different pathways
Fig. 3
Fig. 3
Screening feature genes. (a) PPI network of differentially expressed key module genes. Each circle represents a protein, and the thickness of the lines connecting the proteins indicates the strength of their interaction, with thicker lines representing stronger interactions. (b) Identification of 10 candidate genes based on DMNC values in the CytoHubba plugin. The redder the color, the higher the DMNC value of the gene node. (c) Cross-validation for selecting tuning parameters in the LASSO regression model, with the optimal levels indicated by vertical dotted lines, identifying five key genes. d-e SVM-RFE analysis: gene importance (d) and predicted true value change curves (e). The horizontal axis represents the number of genes in the model, and the vertical axis represents the accuracy of the model’s predictions. (f) Venn diagram illustrates the overlap of feature genes identified by LASSO and SVM analyses
Fig. 4
Fig. 4
Screening and verification of biomarkers. (a-b) Transcriptional validation of two datasets (a, external verification set; b, RNA sequencing data). The horizontal axis represented the control and KD samples, while the vertical axis represented gene expression levels; ns represents no significance, ** represents P < 0.01, *** represents P < 0.001. (c) Western blotting to verify the protein level differences for the four biomarkers. (d-g) ROC analysis of the biomarkers. (h) Correlation analysis showing significant positive correlations (cor > 0.75, P < 0.05) among the four biomarkers
Fig. 5
Fig. 5
GSEA enrichment analysis. a-d GSEA-KEGG analysis of four biomarkers (a NID2, b MFAP2, c COL8A1, d P4HA3). The top part shows the enrichment score line chart, where each line represents a pathway. The peak of each line corresponds to the enrichment score of that pathway, and the genes before the peak are considered the core genes of the pathway. If the peak is located in the top-left corner, it indicates that the core genes are primarily upregulated genes based on the expression level differences of key genes. The second part uses lines to mark the genes located within the gene set
Fig. 6
Fig. 6
Immune infiltration analysis. (a) Heatmap of immune cell infiltration. The first row at the top shows red for the KD group samples and blue for the control samples. Each square in the rows below represents a cell type, with red indicating upregulated expression and blue indicating downregulated expression. The deeper the color, the higher the infiltration abundance. (b) Correlation heatmap illustrating relationships among immune cell types: red represents positive correlations, blue represents negative correlations, and darker colors indicate stronger correlations. * represents P < 0.05, ** represents P < 0.01, *** represents P < 0.001. (c) Differences in immune cell distributions between KD and control samples. * represents P < 0.05, ** represents P < 0.01, *** represents P < 0.001, **** represents P < 0.0001. (d) Correlations between the four biomarkers and various differential immune cell types. Lollipops on the left side represent negative correlation, while lollipops on the right side represent positive correlation. The color of the lollipops indicates the significance of the correlation, with stronger significance represented by greener colors. The size of the lollipops represents the absolute value of the correlation
Fig. 7
Fig. 7
Regulatory network analysis of biomarkers. a TF‒mRNA network (red indicates mRNAs, and cyan blue diamonds represent TFs). b LncRNA‒miRNA‒mRNA network. The blocks from left to right represent mRNA, miRNA, and lncRNA, respectively. The lines in the middle indicate regulatory relationships between them
Fig. 8
Fig. 8
Subcellular localization of biomarkers and drug prediction. (a) Subcellular localization networks for biomarkers (pink represents biomarkers, yellow diamonds represent subcellular biomarkers). (b) Drug modulation network for biomarkers (red indicates biomarkers, purple represents drugs interacting with all four biomarkers, blue indicates drugs interacting with three biomarkers, yellow indicates drugs interacting with two biomarkers, and greenish blue indicates drugs interacting with only one biomarker)

Similar articles

References

    1. Feng F, Liu M, Pan L, Wu J, Wang C, Yang L, et al. Biomechanical regulatory factors and therapeutic targets in keloid fibrosis. Front Pharmacol. 2022;13:906212. - PMC - PubMed
    1. Ashcroft KJ, Syed F, Bayat A. Site-specific keloid fibroblasts alter the behaviour of normal skin and normal Scar fibroblasts through paracrine signalling. PLoS ONE. 2013;8(12):e75600. - PMC - PubMed
    1. Satish L, Lyons-Weiler J, Hebda PA, Wells A. Gene expression patterns in isolated keloid fibroblasts. Wound Repair Regen. 2006;14:463–70. - PubMed
    1. Verhaegen PDHM, van Zuijlen PPM, Pennings NM, van Marle J, Niessen FB, van der Horst CMAM, et al. Differences in collagen architecture between keloid, hypertrophic scar, normotrophic scar, and normal skin: an objective histopathological analysis. Wound Repair Regen. 2009;17:649–56. - PubMed
    1. Tan S, Khumalo N, Bayat A. Understanding keloid pathobiology from a Quasi-Neoplastic perspective: less of a Scar and more of a chronic inflammatory disease with Cancer-Like tendencies. Front Immunol. 2019;10:1810. - PMC - PubMed

LinkOut - more resources