Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction

Xin Huang^#^{1

2}, Di Ouyang^#³, Weiming Xie⁴, Huawei Zhuang¹, Siyu Gao², Pan Liu⁵, Lizhong Guo¹

Affiliations

¹ The First Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, China.
² Yulin Hospital of Traditional Chinese Medicine, Yulin, China.
³ Traditional Chinese Medicine Hospital of Yulin, Yulin, China.
⁴ Academic Affairs and Research Management Office, Yulin Campus of Guangxi Medical University, Yulin, Guangxi, China.
⁵ Huai'an No.3 People's Hospital, Huai'an Second Clinical College of Xuzhou Medical University, Huai'an, China.

^# Contributed equally.

PMID: 40740938
PMCID: PMC12308849
DOI: 10.3389/fmed.2025.1636214

Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction

Xin Huang et al. Front Med (Lausanne). 2025.

. 2025 Jul 16:12:1636214.

doi: 10.3389/fmed.2025.1636214. eCollection 2025.

Authors

Xin Huang^#^{1

2}, Di Ouyang^#³, Weiming Xie⁴, Huawei Zhuang¹, Siyu Gao², Pan Liu⁵, Lizhong Guo¹

Affiliations

¹ The First Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, China.
² Yulin Hospital of Traditional Chinese Medicine, Yulin, China.
³ Traditional Chinese Medicine Hospital of Yulin, Yulin, China.
⁴ Academic Affairs and Research Management Office, Yulin Campus of Guangxi Medical University, Yulin, Guangxi, China.
⁵ Huai'an No.3 People's Hospital, Huai'an Second Clinical College of Xuzhou Medical University, Huai'an, China.

^# Contributed equally.

PMID: 40740938
PMCID: PMC12308849
DOI: 10.3389/fmed.2025.1636214

Abstract

Background: Early identification of Type 1 Diabetes Mellitus (T1DM) in pediatric populations is crucial for implementing timely interventions and improving long-term outcomes. Peripheral blood transcriptomic analysis provides a minimally invasive approach for identifying predictive biomarkers prior to clinical manifestation. This study aimed to develop and validate machine learning algorithms utilizing transcriptomic signatures to predict T1DM onset in children up to 46 months before clinical diagnosis.

Methods: We analyzed 247 peripheral blood RNA-sequencing samples from pre-diabetic children and age-matched healthy controls. Differential gene expression analysis was performed using established bioinformatics pipelines to identify significantly dysregulated transcripts. Five feature selection methods (Lasso, Elastic Net, Random Forest, Support Vector Machine, and Gradient Boosting Machine) were employed to optimize gene sets. Nine machine learning algorithms (Decision Tree, Gradient Boosting Machine, K-Nearest Neighbors, Linear Discriminant Analysis, Logistic Regression, Multilayer Perceptron, Naive Bayes, Random Forest, and Support Vector Machine) were combined with selected features, generating 45 unique model combinations. Performance was evaluated using accuracy, precision, recall, and F1-score metrics. Model validation was conducted using quantitative polymerase chain reaction (qPCR) in an independent cohort of six children (three healthy, three diabetic).

Results: Transcriptomic analysis revealed significant differential expression patterns between pre-diabetic and control groups. Four model combinations demonstrated superior predictive performance: Lasso+K-Nearest Neighbors, Elastic Net + K-Nearest Neighbors, Elastic Net + Random Forest, and Support Vector Machine+K-Nearest Neighbors. These models achieved high accuracy in predicting diabetes onset up to 46 months before clinical diagnosis. Both Elastic Net-based models achieved perfect classification performance in the validation cohort, demonstrating their potential as clinically viable diagnostic tools.

Conclusion: This study establishes the feasibility of integrating peripheral blood transcriptomic profiling with machine learning for early pediatric T1DM prediction. The identified transcriptomic signatures and validated predictive models provide a foundation for developing clinically translatable, non-invasive diagnostic tools. These findings support the implementation of precision medicine approaches for childhood diabetes prevention and warrant validation in larger, multi-center cohorts to assess generalizability and clinical utility.

Keywords: childhood diabetes; machine learning; pediatric biomarkers; peripheral blood; transcriptomic analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 2**
Differential gene expression between prediabetic children and healthy controls when intra-individual correlation was not accounted for. **(A)** The differences gene between two groups. **(B)** The heatmap of different groups.

**Figure 3**
Different model combinations evaluated.

**Figure 4**
Selected feature set derived from machine learning-based feature selection. **(A)** Lasso with multiple models. **(B)** Elastic Net with multiple models. **(C)** Random Forest with multiple models. **(D)** SVM with multiple models.

**Figure 5**
Expression profiles of 24 key genes and classification performance validation using qPCR-based independent dataset.

See this image and copyright information in PMC

References

1. Wang Y, Li H, Rasool A, Wang H, Manzoor R, Zhang G. Polymeric nanoparticles (PNPs) for oral delivery of insulin. J Nanobiotechnol. (2024) 22:1. doi: 10.1186/s12951-023-02253-y, PMID: - DOI - PMC - PubMed
1. Zhu J, Huang J, Sun Y, Xu W, Qian H. Emerging role of extracellular vesicles in diabetic retinopathy. Theranostics. (2024) 14:1631–46. doi: 10.7150/thno.92463, PMID: - DOI - PMC - PubMed
1. Lee B-W, Cho YM, Kim SG, Ko S-H, Lim S, Dahaoui A, et al. Efficacy and safety of once-weekly Semaglutide versus once-daily Sitagliptin as metformin add-on in a Korean population with type 2 diabetes. Diabetes Ther. (2024) 15:547–63. doi: 10.1007/s13300-023-01515-0, PMID: - DOI - PMC - PubMed
1. Delaroque C, Chassaing B. Dietary emulsifier consumption accelerates type 1 diabetes development in NOD mice. NPJ Biofilms Microbiomes. (2024) 10:1. doi: 10.1038/s41522-023-00475-4, PMID: - DOI - PMC - PubMed
1. Urbano F, Farella I, Brunetti G, Faienza MF. Pediatric type 1 diabetes: mechanisms and impact of technologies on comorbidities and life expectancy. Int J Mol Sci. (2023) 24:11980. doi: 10.3390/ijms241511980, PMID: - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction

Affiliations

Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources