Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 19;22(1):18.
doi: 10.1186/s12969-023-00949-x.

Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning

Affiliations

Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning

Pan Ding et al. Pediatr Rheumatol Online J. .

Abstract

Background: Systemic juvenile idiopathic arthritis (SJIA) is a form of childhood arthritis with clinical features such as fever, lymphadenopathy, arthritis, rash, and serositis. It seriously affects the growth and development of children and has a high rate of disability and mortality. SJIA may result from genetic, infectious, or autoimmune factors since the precise source of the disease is unknown. Our study aims to develop a genetic-based diagnostic model to explore the identification of SJIA at the genetic level.

Methods: The gene expression dataset of peripheral blood mononuclear cell (PBMC) samples from SJIA was collected from the Gene Expression Omnibus (GEO) database. Then, three GEO datasets (GSE11907-GPL96, GSE8650-GPL96 and GSE13501) were merged and used as a training dataset, which included 125 SJIA samples and 92 health samples. GSE7753 was used as a validation dataset. The limma method was used to screen differentially expressed genes (DEGs). Feature selection was performed using Lasso, random forest (RF)-recursive feature elimination (RFE) and RF classifier.

Results: We finally identified 4 key genes (ALDH1A1, CEACAM1, YBX3 and SLC6A8) that were essential to distinguish SJIA from healthy samples. And we combined the 4 key genes and performed a grid search as well as 10-fold cross-validation with 5 repetitions to finally identify the RF model with optimal mtry. The mean area under the curve (AUC) value for 5-fold cross-validation was greater than 0.95. The model's performance was then assessed once more using the validation dataset, and an AUC value of 0.990 was obtained. All of the above AUC values demonstrated the strong robustness of the SJIA diagnostic model.

Conclusions: We successfully developed a new SJIA diagnostic model that can be used for a novel aid in the identification of SJIA. In addition, the identification of 4 key genes that may serve as potential biomarkers for SJIA provides new insights to further understand the mechanisms of SJIA.

Keywords: Diagnostic model; GEO; Machine learning; Random forest; Systemic juvenile idiopathic arthritis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The flow chart of this study
Fig. 2
Fig. 2
Differential genes. (A) Volcano map with 22 DEGs, orange dots indicate up-regulated genes, black dots indicate non-differentiated genes, and blue dots indicate down-regulated genes. (B) Heat map of the expression of 150 DEGs.
Fig. 3
Fig. 3
Enrichment analysis and immune cell infiltration analysis. (A) Histogram of GO enrichment analysis. (B) Histogram of KEGG enrichment analysis. (C-D) About pathway-related GSEA. (G) Violin plots of the 22 immune cell abundance differences analyzed between SJIA and healthy groups. (*p < 0.05; **p < 0.01; ***p < 0.001)
Fig. 4
Fig. 4
Feature selection. (A) The lasso regression curve of 28 DEGs. (B) The 10-fold cross-validation parameter (λ) options. (C) The 10-fold cross-validation of accuracy of signature gene combination of RF-RFE. (D) Gene importance scores for RF classifier
Fig. 5
Fig. 5
The ROC curve results for 5-fold cross-validation (A-E) and validation (F) dataset
Fig. 6
Fig. 6
Differential expression of 4 key genes in SJIA, enthesitis-related arthritis, persistent oligoarthritis and rheumatoid factor negative polyarthritis

References

    1. Ravelli A, Martini A. Juvenile idiopathic arthritis. Lancet. 2007;369(9563):767–78. doi: 10.1016/S0140-6736(07)60363-8. - DOI - PubMed
    1. Prakken B, Albani S, Martini A. Juvenile idiopathic arthritis. Lancet. 2011;377(9783):2138–49. doi: 10.1016/S0140-6736(11)60244-4. - DOI - PubMed
    1. Martini A, Lovell DJ, Albani S, Brunner HI, Hyrich KL, Thompson SD, et al. Juvenile idiopathic arthritis. Nat Rev Dis Primers. 2022;8(1):5. doi: 10.1038/s41572-021-00332-8. - DOI - PubMed
    1. Petty RE, Southwood TR, Manners P, Baum J, Glass DN, Goldenberg J, et al. International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton, 2001. J Rheumatol. 2004;31(2):390–2. - PubMed
    1. Woerner A, von Scheven-Gête A, Cimaz R, Hofer M. Complications of systemic juvenile idiopathic arthritis: risk factors and management recommendations. Expert Rev Clin Immunol. 2015;11(5):575–88. doi: 10.1586/1744666X.2015.1032257. - DOI - PubMed