Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 6:13:1109903.
doi: 10.3389/fgene.2022.1109903. eCollection 2022.

A prognostic model based on clusters of molecules related to epithelial-mesenchymal transition for idiopathic pulmonary fibrosis

Affiliations

A prognostic model based on clusters of molecules related to epithelial-mesenchymal transition for idiopathic pulmonary fibrosis

Jiarui Zhao et al. Front Genet. .

Abstract

Background: Most patients with idiopathic pulmonary fibrosis (IPF) have poor prognosis; Effective predictive models for these patients are currently lacking. Epithelial-mesenchymal transition (EMT) often occurs during idiopathic pulmonary fibrosis development, and is closely related to multiple pathways and biological processes. It is thus necessary for clinicians to find prognostic biomarkers with high accuracy and specificity from the perspective of Epithelial-mesenchymal transition. Methods: Data were obtained from the Gene Expression Omnibus database. Using consensus clustering, patients were grouped based on Epithelial-mesenchymal transition-related genes. Next, functional enrichment analysis was performed on the results of consensus clustering using gene set variation analysis. The gene modules associated with Epithelial-mesenchymal transition were obtained through weighted gene co-expression network analysis. Prognosis-related genes were screened via least absolute shrinkage and selection operator (LASSO) regression analysis. The model was then evaluated and validated using survival analysis and time-dependent receiver operating characteristic (ROC) analysis. Results: A total of 239 Epithelial-mesenchymal transition-related genes were obtained from patients with idiopathic pulmonary fibrosis. Six genes with strong prognostic associations (C-X-C chemokine receptor type 7 [CXCR7], heparan sulfate-glucosamine 3-sulfotransferase 1 [HS3ST1], matrix metallopeptidase 25 [MMP25], murine retrovirus integration site 1 [MRVI1], transmembrane four L6 family member 1 [TM4SF1], and tyrosylprotein sulfotransferase 1 [TPST1]) were identified via least absolute shrinkage and selection operator and Cox regression analyses. A prognostic model was then constructed based on the selected genes. Survival analysis showed that patients with high-risk scores had worse prognosis based on the training set [hazard ratio (HR) = 7.31, p < .001] and validation set (HR = 2.85, p = .017). The time-dependent receiver operating characteristic analysis showed that the area under the curve (AUC) values in the training set were .872, .905, and .868 for 1-, 2-, and 3-year overall survival rates, respectively. Moreover, the area under the curve values in the validation set were .814, .814, and .808 for 1-, 2-, and 3-year overall survival rates, respectively. Conclusion: The independent prognostic model constructed from six Epithelial-mesenchymal transition-related genes provides bioinformatics guidance to identify additional prognostic markers for idiopathic pulmonary fibrosis in the future.

Keywords: bioinformatics; bronchoalveolar lavage cells; epithelial-mesenchymal transition; idiopathic pulmonary fibrosis; prognostic model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Study outline.
FIGURE 2
FIGURE 2
Acquisition and analysis of four EMT-related DEGs in IPF. (A) The 110 DEGs identified are displayed in the volcano plot based on the criteria of p < .05 and log2FC > 1.5. (B) The EMT-related genes are presented in a Venn diagram. IPF, idiopathic pulmonary fibrosis; EMT, epithelial–mesenchymal transition; DEGs, differentially expressed genes.
FIGURE 3
FIGURE 3
Analysis of immune cell type infiltration. (A) Relative abundance of immune cell types in the IPF and control samples. (B) Differences in immune cell infiltration between IPF and control samples. (C) EMT-related DEGs displayed based on an immune correlation analysis. IPF, idiopathic pulmonary fibrosis; EMT, epithelial–mesenchymal transition; DEGs, differentially expressed genes.
FIGURE 4
FIGURE 4
Consensus clustering of IPF samples. (A) Consensus clustering matrix constructed based on the final K = 2. (B) Consensus CDF. The different color numbers in the figure represent the different K from two to nine. The horizontal coordinate represents consensus index and the vertical coordinate represents CDF value. (C) Area under the CDF. The horizontal coordinate represents the different K from two to nine and the vertical coordinate represents the change in area under the CDF curve. (D) PCA of the two clusters. C1 is cluster 1, C2 is cluster 2. (E) The cluster-consensus plot demonstrates the consensus clustering results. The horizontal coordinate represents the different K from two to nine and the vertical coordinate represents the consistency score. (F) The box plot shows the significant differences in the four EMT-related genes between the two clusters. C1 is cluster 1, C2 is cluster 2. (G) The heatmap shows the specific differences in the four genes between the two clusters. C1 is cluster 1, C2 is cluster 2. IPF, idiopathic pulmonary fibrosis; CDF, cumulative distribution function; PCA, principal component analysis; EMT, epithelial–mesenchymal transition.
FIGURE 5
FIGURE 5
Gene module selection using WGCNA. (A) Selection of the soft threshold power in clusters one and two. When the scale-free fit index is .9, the minimum soft threshold is 4. (B) Gene clustering tree in cluster 1 and cluster 2. C1 is cluster 1, C2 is cluster 2. (C) Correlation heatmap between the co-expression modules in clusters 1 and 2; the brown module has the highest correlation and the lowest p-value (P = 2e-16) in cluster 1 (correlation = −.68) and cluster 2 (correlation = .68). (D) Selection of the soft threshold power for the 112 IPF and 20 control samples. When the scale-free fit index is .9, the minimum soft threshold is 4. (E) Gene clustering tree of the 112 IPF and 20 control samples. (F) Correlation heatmap between the co-expression modules in the 112 IPF and 20 control samples; the brown module has the highest correlation and the lowest p-value (P = 3e-05) in cluster 1 (correlation = −.35) and cluster 2 (correlation = .35). WGCNA, weighted gene co-expression network analysis; IPF, idiopathic pulmonary fibrosis.
FIGURE 6
FIGURE 6
Generation of a prognostic model for patients with IPF. (A) The Venn diagram of the 239 EMT-related genes which got from the intersection of WGCNA results. (B) LASSO coefficient profiles of the 239 genes. (C) The largest λ value (λ = 6) in the mean square error within the standard error. (D) Univariate Cox analysis of the six selected genes. All p-values from the univariate Cox analysis of the six genes are less than .000. IPF, idiopathic pulmonary fibrosis; LASSO, least absolute shrinkage and selection operator.
FIGURE 7
FIGURE 7
Evaluation and validation of prognostic models. (A) Nomogram of the model for 1-, 2-, and 3-year overall survival rates. (B) Calibration curves of the model based on 1-, 2-, and 3-year overall survival rates. (C) Time-dependent ROC curve based on the median of risk score in the training set. The 1-year AUC is .727, the 2-year AUC is .905, and the 3-year AUC is .868. (D) Box plots showing that the Wilcoxon P-test results (P = 6e-11) are less than .05 between the different groups based on the median of risk score in the training set. (E) Kaplan–Meier survival curve showing a clear difference between groups based on the median of risk score in the training set [HR = 7.31, 95% CI: (4.24, 12.60), p < .001]. (F) Time-dependent ROC curve based on the validation set. The 1-year AUC is .814, the 2-year AUC is .814, and the 3-year AUC is .808. (G) Box plots presenting a significant difference (p = .0018) in the validation set. (H) Kaplan–Meier survival curve showing a clear difference in the validation set [HR = 2.85, 95% CI: (1.21, 6.74), p = .017]. ROC, receiver operating characteristic; AUC, area under the curve; HR, hazard ratio; CI, confidence interval.

Similar articles

Cited by

References

    1. Amano H., Mastui Y., Ito Y., Shibata Y., Betto T., Eshima K., et al. (2019). The role of vascular endothelial growth factor receptor 1 tyrosine kinase signaling in bleomycin-induced pulmonary fibrosis. Biomed. Pharmacother. = Biomedecine Pharmacother. 117, 109067. 10.1016/j.biopha.2019.109067 - DOI - PubMed
    1. Davis S., Meltzer P. S. (2007). GEOquery: A bridge between the gene expression Omnibus (GEO) and BioConductor. Bioinforma. Oxf. Engl. 23 (14), 1846–1847. 10.1093/bioinformatics/btm254 - DOI - PubMed
    1. DeMaio L., Buckley S. T., Krishnaveni M. S., Flodby P., Dubourd M., Banfalvi A., et al. (2012). Ligand-independent transforming growth factor-β type I receptor signalling mediates type I collagen-induced epithelial-mesenchymal transition. J. pathology 226 (4), 633–644. 10.1002/path.3016 - DOI - PMC - PubMed
    1. Dongre A., Weinberg R. A. (2019). New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer. Nat. Rev. Mol. Cell Biol. 20 (2), 69–84. 10.1038/s41580-018-0080-4 - DOI - PubMed
    1. Ferreras L., Moles A., Situmorang G. R., El Masri R., Wilson I. L., Cooke K., et al. (2019). Heparan sulfate in chronic kidney diseases: Exploring the role of 3-O-sulfation. General Subj. 1863 (5), 839–848. 10.1016/j.bbagen.2019.02.009 - DOI - PubMed

LinkOut - more resources