Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 4:10:1298457.
doi: 10.3389/fmolb.2023.1298457. eCollection 2023.

Construction and evaluation of endometriosis diagnostic prediction model and immune infiltration based on efferocytosis-related genes

Affiliations

Construction and evaluation of endometriosis diagnostic prediction model and immune infiltration based on efferocytosis-related genes

Fang-Li Pei et al. Front Mol Biosci. .

Abstract

Background: Endometriosis (EM) is a long-lasting inflammatory disease that is difficult to treat and prevent. Existing research indicates the significance of immune infiltration in the progression of EM. Efferocytosis has an important immunomodulatory function. However, research on the identification and clinical significance of efferocytosis-related genes (EFRGs) in EM is sparse. Methods: The EFRDEGs (differentially expressed efferocytosis-related genes) linked to datasets associated with endometriosis were thoroughly examined utilizing the Gene Expression Omnibus (GEO) and GeneCards databases. The construction of the protein-protein interaction (PPI) and transcription factor (TF) regulatory network of EFRDEGs ensued. Subsequently, machine learning techniques including Univariate logistic regression, LASSO, and SVM classification were applied to filter and pinpoint diagnostic biomarkers. To establish and assess the diagnostic model, ROC analysis, multivariate regression analysis, nomogram, and calibration curve were employed. The CIBERSORT algorithm and single-cell RNA sequencing (scRNA-seq) were employed to explore immune cell infiltration, while the Comparative Toxicogenomics Database (CTD) was utilized for the identification of potential therapeutic drugs for endometriosis. Finally, immunohistochemistry (IHC) and reverse transcription quantitative polymerase chain reaction (RT-qPCR) were utilized to quantify the expression levels of biomarkers in clinical samples of endometriosis. Results: Our findings revealed 13 EFRDEGs associated with EM, and the LASSO and SVM regression model identified six hub genes (ARG2, GAS6, C3, PROS1, CLU, and FGL2). Among these, ARG2, GAS6, and C3 were confirmed as diagnostic biomarkers through multivariate logistic regression analysis. The ROC curve analysis of GSE37837 (AUC = 0.627) and GSE6374 (AUC = 0.635), along with calibration and DCA curve assessments, demonstrated that the nomogram built on these three biomarkers exhibited a commendable predictive capacity for the disease. Notably, the ratio of nine immune cell types exhibited significant differences between eutopic and ectopic endometrial samples, with scRNA-seq highlighting M0 Macrophages, Fibroblasts, and CD8 Tex cells as the cell populations undergoing the most substantial changes in the three biomarkers. Additionally, our study predicted seven potential medications for EM. Finally, the expression levels of the three biomarkers in clinical samples were validated through RT-qPCR and IHC, consistently aligning with the results obtained from the public database. Conclusion: we identified three biomarkers and constructed a diagnostic model for EM in this study, these findings provide valuable insights for subsequent mechanistic research and clinical applications in the field of endometriosis.

Keywords: and machine learning; bioinformatics; efferocytosis; endometriosis; immune infiltration.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flowchart depicting the stepwise screening strategy applied to bioinformatics data.
FIGURE 2
FIGURE 2
Principal component analysis (PCA) illustrates gene expression patterns across datasets. In the scatter plots, each point represents a sample based on the top two principal components (PC1 and PC2) of gene expression profiles. (A) The batch effect is obvious. (B) The removal of batch effect. (C) Batch effects are removed for ectopic and eutopic samples after correction. Colors represent corresponding samples across three distinct datasets.
FIGURE 3
FIGURE 3
EFRDEGs in endometriosis: (A) Venn diagram illustrating the overlapping genes among DEGs, Efferocytosis-related genes, and EFRDEGs in EMs. In the representation, red demonstrates differentially upregulated genes, blue signifies differentially downregulated genes, and yellow represents efferocytosis-related genes. (B) Volcano plot displaying the 13 identified EFRDEGs in EMs. Red displays upregulated genes while blue displays downregulated genes. (C) Heatmap visualizing the expression levels of the 13 EFRDEGs. Red shows high expression, and blue shows low expression. (D) Chromosome region of the 13 ERDEGs. Gene names shown in red represent upregulated genes in the disease group while blue represents downregulated genes.
FIGURE 4
FIGURE 4
Construction of protein–protein interaction (PPI) and transcription factor (TF) for EFRDEGs: (A) Different nodes represent distinct proteins, and the color of the node represents the enriched pathway. The red node represents the enrichment of related proteins into the complement and coagulation cascade pathways. The color intensity around the node reflects the log2FC magnitude, with red representing downregulated EFRDEGs and blue representing upregulated. The color intensity around the node reflects the log2FC magnitude, with red representing downregulated EFRDEGs and blue representing upregulated EFRDEGs. (B) Green circles represent TF candidates predicted from the database. Only results with p < 0.05 were retained. Blue circles illustrate downregulated EFRDEGs, while orange circles illustrate upregulated EFRDEGs.
FIGURE 5
FIGURE 5
diagnostic markers selection: (A) Receiver operating characteristic (ROC) curves for the 8 EFRDEGs are displayed. The x-axis denotes the false-positive rate, while the y-axis represents the true-positive rate, quantified by sensitivity. The area under the ROC curve (AUC) measures the intensity of connection between the gene and the disease, with a higher AUC indicating a pretty association. (B) Box plots depict the expression levels of the eight chosen genes (CLU, C3, CLU, FGL2, PROS1, GAS6, C1QA, ARG2, and PECAM1) in both eutopic and ectopic endometrial tissues. Green represents eutopic endometria, while red represents ectopic endometria. ***p < 0.001 signifies a statistically great difference in gene expression between the two types of endometria.
FIGURE 6
FIGURE 6
Diagnostic biomarkers selection using two machine learning methods. (A) The least absolute shrinkage and selection operator (LASSO) algorithm results are presented in two plots. In the left plot, the horizontal axis symbolizes log(λ) values and the vertical axis symbolizes regression cross-validation errors. The right plot displays the ln-transformed minimum log(λ) values along the horizontal axis and the corresponding coefficients on the vertical axis. Six genes whose coefficients were not 0 when lambda = 0.037 were screened out. (B) Support vector machine recursive feature elimination (SVM-RFE) regression model algorithm identified seven diagnostic biomarkers. The right plot illustrates the ranking of these seven feature genes according to their importance from highest to lowest as follows: PECAM1, GLU, GAS6, ARG2, PROS1, FGL2, and C3.
FIGURE 7
FIGURE 7
Establishment and assessment of diagnostic prediction model: (A) A nomogram of diagnostic biomarkers, where “Point” represents individual scores on the scale; ARG2, GAS6, and C3 correspond to the scores of each gene; “Total Point” represents the combined score of the three hub genes. (B) Decision curve analyses (DCAs) for the nomogram, show that the model curves are above the high-risk threshold curve. (C) Calibration curves of the hub genes, demonstrating good calibration of the combined model after bias correction. (D) ROC curve of the nomogram model with an AUC of 0.978, and the test sets GSE37837 and GSE6364, with an AUC of 0.627 and 0.635, respectively.
FIGURE 8
FIGURE 8
(Continued).
FIGURE 9
FIGURE 9
Single-cell RNA sequencing analysis (scRNA-seq) of immune infiltration: (A) Uniform manifold approximation and projection (UMAP) clustering plot showing a total of 15 distinct cell clusters. (B) Annotation of the main 8 immune infiltrating cell subtypes obtained from clustering. (C) Circular chart representing different cell clusters, with the values indicating the relative immune infiltration abundance. (D–F) Violin plots illustrate the immune infiltration abundance of the three diagnostic biomarkers. Each dot represents a single cell, with the x-axis indicating different cell clusters and the y-axis representing the expression levels.
FIGURE 10
FIGURE 10
Functional enrichment analysis of C3 and potential drugs targeting diagnostic biomarkers. (A) Dot plot depicted the 20 most relevant Hall mark term with a p-value less than 0.05 ranked by gene ratio. Dot size is proportional to the number of overlapping genes. P-values are colour-coded according to the colour scale. (B) Heatmap plots of gene set variation analysis (GSVA) scores of the mSigDb Hallmark gene sets for the training set are shown for the TOP 20 sets with the highest significance in high-risk score level vs. low-risk score level comparison of C3. (C) Clustering network of significantly enriched KEGG pathways in the GSEA analysis, which deletes pathways related to disease types. The nodes represent the significant KEGG pathways and the edges represent the similarity between them and are coloured by normalised enrichment score (NES). The lines connected to similar pathways are coloured by similarity. (D) Protein-drug interaction network. Circle represent the hub dysregulated genes, while squares indicate the interacting drugs molecules. Node size is proportional to the degree (number of coincident edges).
FIGURE 11
FIGURE 11
RT-qPCR and Immunohistochemical Analysis for diagnostic biomarkers. (A) Relative protein expressions of three diagnostic biomarkers in the ectopic and eutopic endometria, as determined by IHC techniques. (B) Relative expressions of three diagnostic biomarkers in the ectopic and eutopic endometria, as decided by qRT-PCR analysis. (*p < 0.05, **p < 0.01, ***p < 0.001).

Similar articles

Cited by

References

    1. Ahn S. H., Khalaj K., Young S. L., Lessey B. A., Koti M., Tayade C. (2016). Immune-inflammation gene signatures in endometriosis patients. Fertil. Steril. 106 (6), 1420–1431. 10.1016/j.fertnstert.2016.07.005 - DOI - PMC - PubMed
    1. Akter S., Xu D., Nagel S. C., Bromfield J. J., Pelch K., Wilshire G. B., et al. (2019). Machine learning classifiers for endometriosis using transcriptomics and methylomics data. Front. Genet. 10, 766. 10.3389/fgene.2019.00766 - DOI - PMC - PubMed
    1. Ammoun S., Provenzano L., Zhou L., Barczyk M., Evans K., Hilton D., et al. (2014). Axl/Gas6/NFκB signalling in schwannoma pathological proliferation, adhesion and survival. Oncogene 33 (3), 336–346. 10.1038/onc.2012.587 - DOI - PubMed
    1. Asseldonk E. J., Gunaratnam L., Humphreys B. D., Duffield J. S., Bonventre J. V., Ichimura T. (2008). Kidney injury molecule-1 is a phosphatidylserine receptor that confers a phagocytic phenotype on epithelial cells. J. Clin. Invest. 118 (5), 1657–1668. 10.1172/JCI34487 - DOI - PMC - PubMed
    1. Bakhtiarizadeh M. R., Hosseinpour B., Shahhoseini M., Korte A., Gifani P. (2018). Weighted gene Co-expression network analysis of endometriosis and identification of functional modules associated with its main hallmarks. Front. Genet. 12 (9), 453. 10.3389/fgene.2018.00453 - DOI - PMC - PubMed