Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 3;15(1):34572.
doi: 10.1038/s41598-025-18017-7.

Discovering periodontitis biomarkers and therapeutic targets through bioinformatics and ensemble learning analysis

Affiliations

Discovering periodontitis biomarkers and therapeutic targets through bioinformatics and ensemble learning analysis

Md Tanvir Hasan et al. Sci Rep. .

Abstract

Periodontitis, a prevalent inflammatory disease, leads to the progressive destruction of periodontal tissues and poses significant systemic health risks. Despite its widespread impact, the molecular mechanisms driving periodontitis remain poorly understood. This study integrates advanced ensemble machine learning models and bioinformatics approaches to elucidate the genetic basis of periodontitis. Using transcriptomic data from the gene expression omnibus (GEO) repository (GSE10334), we identified 21 common genes from bagging and boosting models, underscoring their critical role in disease pathophysiology. Protein-protein interaction (PPI) network analysis revealed hub genes (HNRNPC, TSR1, PLRG1, GOPC) with central roles in key biological pathways. Functional enrichment highlighted their involvement in actin filament regulation, immune response modulation, and RNA processing. Furthermore, mutation and copy number alteration (CNA) analyses revealed significant genetic diversity in these hub genes, particularly in diploid samples, with a high prevalence of missense and splice variants. Together, these findings advance our understanding of the molecular landscape of periodontitis, paving the way for novel biomarker discovery and targeted therapeutic strategies potentially leading to improved diagnostic and treatment approaches in periodontal care.

Keywords: Biomarker; Ensemble learning model; Machine learning; Mutation and copy number alteration; Periodontitis; Protein-protein interaction.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The methodology process for the investigation is presented in a workflow. Gene expression data from the GSE10334 dataset was obtained from the GEO repository and preprocessed using various machine learning techniques. Ensemble machine learning models, including boosting and bagging approaches, were employed to classify the data and extract feature importance through cross-validation. Identified features were subjected to bioinformatics analyses such as PPI network construction, GO and pathway analysis, hub gene identification, and CNA analysis. This comprehensive pipeline offers insights into gene expression, biological pathways, and functional mechanisms.
Fig. 2
Fig. 2
Visualization of feature distribution and differential expression analysis in data: (A) scatter plot of data features highlighting class distribution (B) volcano plot depicting differential expression analysis.
Fig. 3
Fig. 3
Performance evaluation of bagging model: confusion matrix and ROC curve with AUC = 0.98: (A) heatmap of the confusion matrix. The confusion matrices were computed using the confusion_matrix function from the sklearn.metrics module, based on the encoded test labels (y_test_encoded) and the model predictions (y_pred). The heatmaps were then plotted using the seaborn library’s heatmap function with annotation and a blue color scheme for clarity. Python (Version 3.10.12), Seaborn (Version 0.11.2), Matplotlib (Version 3.5.1), scikit-learn (Version 1.0.2) were used to generate the Heatmap. (B) ROC curve of the Bagging model.
Fig. 4
Fig. 4
Performance evaluation of boosting model: confusion matrix and ROC curve with AUC = 0.98: (A) Heatmap of the confusion matrix. The confusion matrices were computed using the confusion_matrix function from the sklearn.metrics module, based on the encoded test labels (y_test_encoded) and the model predictions (y_pred). The heatmaps were then plotted using the seaborn library’s heatmap function with annotation and a blue color scheme for clarity. Python (Version 3.10.12), Seaborn (Version 0.11.2), Matplotlib (Version 3.5.1), scikit-learn (Version 1.0.2) were used to generate the Heatmap. (B) ROC curve of the Boosting model.
Fig. 5
Fig. 5
Venn diagram of the 21 common genes that were identified from the Bagging and Boosting model through the Feature importance method.
Fig. 6
Fig. 6
The figure illustrates two aspects of a protein-protein interaction (PPI) network: (A) PPI Network Visualization: A broad representation of the interactions between multiple proteins, highlighting key hub proteins such as HNRNPC and TSR1, which display dense connectivity within their respective clusters. (B) Subnetwork of hub proteins: Focuses on a smaller set of highly interconnected hub proteins, such as HNRNPC, TSR1, PLRG1 and GOPC, emphasizing their central roles. The color gradient represents varying levels of significance or connectivity, with red indicating the most influential hubs.
Fig. 7
Fig. 7
Group based gene ontology analysis reveals key functional enrichments across biological processes, cellular components, and molecular functions.
Fig. 8
Fig. 8
Group based pathways enrichment analysis reveals key functional pathways enrichments across KEGG (2022), Reactome (2022), and Wikipathways (2022) database.
Fig. 9
Fig. 9
Comprehensive analysis of mutation and copy number alterations in key genes (GOPC, TSR1, HNRNPC, and PLRG1) highlights diverse genetic variations.

References

    1. Kassebaum, N. J. et al. Global burden of severe periodontitis in 1990–2010: a systematic review and meta-regression. J. Dent. Res.93 (11), 1045–1053. 10.1177/0022034514552491 (2014). - PMC - PubMed
    1. Nazir, M. A. Prevalence of periodontal disease, its association with systemic diseases and prevention. Int. J. Health Sci. (Qassim). 11 (2), 72–80 (2017). - PMC - PubMed
    1. Jacobs, R., Fontenele, R. C., Lahoud, P., Shujaat, S. & Bornstein, M. M. Radiographic diagnosis of periodontal diseases - Current evidence versus innovations. Periodontol. 200095 (1), 51–69. 10.1111/prd.12580 (2024). - PubMed
    1. Mallya, S. M., Tetradis, S. & Dwarakanath, C. D. Radiographic Aids in the Diagnosis. Newman and Carranza’s Clinical Periodontology: 4th South Asia Edition-E-Book. 313 (2024).
    1. Tonetti, M. S., Greenwell, H. & Kornman, K. S. Staging and grading of periodontitis: framework and proposal of a new classification and case definition. J. Periodontol. 89 (Suppl 1), S159–S172. 10.1002/JPER.18-0006 (2018). - PubMed

LinkOut - more resources