Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 18;19(1):832.
doi: 10.1186/s13018-024-05340-4.

Developing the new diagnostic model by integrating bioinformatics and machine learning for osteoarthritis

Affiliations

Developing the new diagnostic model by integrating bioinformatics and machine learning for osteoarthritis

Jian Du et al. J Orthop Surg Res. .

Abstract

Background: Osteoarthritis (OA) is a common cause of disability among the elderly, profoundly affecting quality of life. This study aims to leverage bioinformatics and machine learning to develop an artificial neural network (ANN) model for diagnosing OA, providing new avenues for early diagnosis and treatment.

Methods: From the Gene Expression Omnibus (GEO) database, we first obtained OA synovial tissue microarray datasets. Differentially expressed genes (DEGs) associated with OA were identified through utilization of the Limma package and weighted gene co-expression network analysis (WGCNA). Subsequently, protein-protein interaction (PPI) network analysis and machine learning were employed to identify the most relevant potential feature genes of OA, and ANN diagnostic model and receiver operating characteristic (ROC) curve were constructed to evaluate the diagnostic performance of the model. In addition, the expression levels of the feature genes were verified using real-time quantitative polymerase chain reaction (qRT-PCR). Finally, immune cell infiltration analysis was performed using CIBERSORT algorithm to explore the correlation between feature genes and immune cells.

Results: The Limma package and WGCNA identified a total of 72 DEGs related to OA, of which 12 were up-regulated and 60 were down-regulated. Then, the PPI network analysis identified 21 hub genes, and three machine learning algorithms finally screened four feature genes (BTG2, CALML4, DUSP5, and GADD45B). The ANN diagnostic model was constructed based on these four feature genes. The AUC of the training set was 0.942, and the AUC of the validation set was 0.850. In addition, the qRT-PCR validation results demonstrated a significant downregulation of BTG2, DUSP5, and GADD45 mRNA expression levels in OA samples compared to normal samples, while CALML4 mRNA expression level exhibited an upregulation. Immune cell infiltration analysis revealed B cells memory, T cells gamma delta, B cells naive, Plasma cells, T cells CD4 memory resting, and NK cells The abnormal infiltration of activated cells may be related to the progression of OA.

Conclusions: BTG2, CALML4, DUSP5, and GADD45B were identified as potential feature genes for OA, and an ANN diagnostic model with good diagnostic performance was developed, providing a new perspective for the early diagnosis and personalized treatment of OA.

Keywords: Artificial neural networks; Feature genes; Immune cell infiltration; Machine learning; Osteoarthritis.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The overall flowchart of this study
Fig. 2
Fig. 2
Heatmap and volcano map of DEGs. (A) Volcano plot and DEGs. (B) Heatmap of DEGs
Fig. 3
Fig. 3
WGCNA (A) Analysis of mean connectivity and scale-free fit index (B) Cluster dendrogram of genes. (β) across different soft threshold powers. (C) Heatmap of the correlation between gene modules and phenotypes. (D) Venn diagram depicting the overlap between DEGs and WGCNA
Fig. 4
Fig. 4
Functional enrichment analysis. (A) GO enrichment analysis bubble plot. (B) KEGG enrichment analysis bubble plot
Fig. 5
Fig. 5
Construction of the PPI network. (A) PPI network constructed by 72 DEGs. (B) The top 30 DEGs screened by Degree algorithm in Cytoscape software (darker color indicates greater significance). (C) The top 30 DEGs screened by the Closeness algorithm in Cytoscape software (darker colors indicate greater significance). (D) The top 30 DEGs screened by the Betweenness algorithm in Cytoscape software (darker colors indicate greater significance). (E) Venn diagram of the top 30 genes ranked by Degree, Closeness and Betweennes algorithms
Fig. 6
Fig. 6
Machine learning to screen feature genes. (A) Distribution of LASSO regression coefficients. (B) Confidence interval of the LASSO regression algorithm log (λ). (C) Accuracy graph of SVM-RFE algorithm. (D) Error plot of SVM-RFE algorithm. (E) Correlation between random forest trees and model errors. (F) RF importance score results. (G) Venn diagram of three machine learning algorithms
Fig. 7
Fig. 7
Construction and validation of ANN diagnostic model. (A) Expression levels of feature genes in the training set (B) visualization of the Ann diagnostic model. (C) The AUC value of the ANN diagnostic model was evaluated in the training set. (D) The validation set evaluated the AUC value of the ANN diagnostic model. (E) AUC values of feature genes in the training set. (F) AUC values of features genes in the validation set
Fig. 8
Fig. 8
External validation of the expression level of feature genes. (A) Expression levels of BTG2. (B) Expression levels of CALML4. (C) Expression levels of DUSP5. (D) Expression levels of GADD45B
Fig. 9
Fig. 9
The mRNA expression levels of four characteristic genes were quantified using qRT-PCR: (A) BTG2 mRNA expression levels. (B) CALML4 mRNA expression levels. (C) DUSP5 mRNA expression levels. (D) GADD45B mRNA expression levels. Statistical significance is indicated by * for P<0.05, ** for P<0.01, and “ns” denotes no significant difference
Fig. 10
Fig. 10
Analysis of immune cell infiltration. (A) Correlation analysis of immune cells. Red indicates positive correlation and blue indicates negative correlation. (B) Boxplots of immune cell expression between OA and normal groups, with statistical significance expressed as *p < 0.05, **p < 0.01, and ***p < 0.001
Fig. 11
Fig. 11
Correlation analysis between feature genes and immune cells. (A) The correlation between BTG2 and immune infiltrating cells. (B) The correlation between CALML4 and immune infiltrating cells. (C) The correlation between DUSP5 and immune infiltrating cells. (D) The correlation between GADD45B and immune infiltrating cells. The size of the dot represents the strength of the correlation between the gene and the immune cell; Larger dots indicate stronger correlations, while smaller dots indicate weaker correlations. The color of the points indicates the p-value; The greener the color, the lower the P value, and conversely, a more yellowish color indicates a higher P value. A p-value < 0.05 vas considered statistically significant

Similar articles

Cited by

References

    1. Molnar V et al. Cytokines and chemokines involved in Osteoarthritis Pathogenesis. Int J Mol Sci, 2021. 22(17). - PMC - PubMed
    1. Wang K, Li Y, Lin J. Identification of diagnostic biomarkers for osteoarthritis through bioinformatics and machine learning. Heliyon. 2024;10(6):e27506. - PMC - PubMed
    1. Wallace IJ, et al. Knee osteoarthritis has doubled in prevalence since the mid-20th century. Proc Natl Acad Sci U S A. 2017;114(35):9332–6. - PMC - PubMed
    1. Yin X, et al. Research progress on macrophage polarization during osteoarthritis disease progression: a review. J Orthop Surg Res. 2024;19(1):584. - PMC - PubMed
    1. Martel-Pelletier J, et al. Osteoarthr Nat Rev Dis Primers. 2016;2:16072. - PubMed

MeSH terms