Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 29;16(10):e0259475.
doi: 10.1371/journal.pone.0259475. eCollection 2021.

Screening of key biomarkers of tendinopathy based on bioinformatics and machine learning algorithms

Affiliations

Screening of key biomarkers of tendinopathy based on bioinformatics and machine learning algorithms

Ya Xi Zhu et al. PLoS One. .

Abstract

Tendinopathy is a complex multifaceted tendinopathy often associated with overuse and with its high prevalence resulting in significant health care costs. At present, the pathogenesis and effective treatment of tendinopathy are still not sufficiently elucidated. The purpose of this research is to intensely explore the genes, functional pathways, and immune infiltration characteristics of the occurrence and development of tendinopathy. The gene expression profile of GSE106292, GSE26051 and GSE167226 are downloaded from GEO (NCBI comprehensive gene expression database) and analyzed by WGCNA software bag using R software, GSE26051, GSE167226 data set is combined to screen the differential gene analysis. We subsequently performed gene enrichment analysis of Gene Ontology (GO) and "Kyoto Encyclopedia of Genes and Genomes" (KEGG), and immune cell infiltration analysis. By constructing the LASSO regression model, Support vector machine (SVM-REF) and Gaussian mixture model (GMMs) algorithms are used to screen, to identify early diagnostic genes. We have obtained a total of 171 DEGs through WGCNA analysis and differentially expressed genes (DEGs) screening. By GO and KEGG enrichment analysis, it is found that these dysregulated genes were related to mTOR, HIF-1, MAPK, NF-κB and VEGF signaling pathways. Immune infiltration analysis showed that M1 macrophages, activated mast cells and activated NK cells had infiltration significance. After analysis of THE LASSO SVM-REF and GMMs algorithms, we found that the gene MACROD1 may be a gene for early diagnosis. We identified the potential of tendon disease early diagnosis way and immune gene regulation MACROD1 key infiltration characteristics based on comprehensive bioinformatics analysis. These hub genes and functional pathways may as early biomarkers of tendon injuries and molecular therapy level target is used to guide drug and basic research.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Flowchart of this study.
The following datasets were used for the identification of potential diagnostic genes and mechanisms associated with the development of sepsis: GSE106292, GSE26051, GSE167226.
Fig 2
Fig 2. The gene differential expression analysis of GSE26051 and GSE167226 data set.
(A) Whole gene expression heat map: Whole gene expression heat map of tendon tissue, with high expression in red and low expression in blue (B) The DEG Volcano map shows upregulated genes in red and down-regulated genes in blue.
Fig 3
Fig 3. The screening criteria of WGCNA.
(A)Soft Threshold (Power) represents the weight, and the vertical axis shows the scale-free topology fitting index R^2 (B) Soft Threshold (Power) represents the weight, and the vertical axis shows the average connectivity of the network (C) Distribution of node connectivity K (D) Correlation graph of K and P (K).
Fig 4
Fig 4. The WGCNA analysis of GSE106292 data set.
(A)Tree of all gene expressions based on the Difference Measure (1-TOM) cluster (B)The heat maps of correlations between modules feature genes and samples, with each cell containing correlation coefficients and P values (C)The expression calorimetry and feature vector histogram of PURPRE module (D)The expression calorimetry and feature vector histogram of Skyblue2 module.
Fig 5
Fig 5. Baseball figure of differential gene enrichment analysis.
The horizontal axis represents the proportion of differential genes in GO and KEGG enrichment analysis, and the vertical axis represents the enrichment category. (A)Up-regulated GO enrichment distribution of differentially expressed genes (B) Down-regulated GO enrichment distribution of differentially expressed genes (C) Up-regulated differential gene KEGG enrichment distribution (D) Down-regulated differential gene KEGG enrichment distribution.
Fig 6
Fig 6. Infiltration patterns of immune cells in different groups.
(A)Relative percentage of 22 immune cell subsets in tendon disease samples (B) Heat map of immune cell infiltration between tendinopathy group and control group, green represents tendinopathy group, red line represents control group (C) Infiltration degree of 22 immune cell subsets in tendon disease samples (D) Box Diagram of Immune Infiltration Difference between Tendon Disease Group and Control Group, Green as Tendon Disease Group, Red as Control Group.
Fig 7
Fig 7. The potential key genes of tendinopathy were screened by LASSO regression model.
In Fig 7A and 7B, the ordinate is the value of the coefficient, the lower abscissa is log(λ), and the upper abscissa is the number of non-zero coefficients in the model at this time. (A) Selection of the best parameter (number of non-zero coefficients in the model at this timet (B) LASSO coefficient spectrum of 18 differentially expressed genes selected by optimal (s timeti (C) Comparison of ROC curves between training set and validation set for gene signature.
Fig 8
Fig 8. MSVM-RFE algorithm for screening key genes.
(A)Shows the error rate of the SVM model (B) Shows the accuracy of the SVM model (C)The Venn diagram shows the same key genes obtained by the two algorithms.
Fig 9
Fig 9. Displays the patterns of AUC and 262143 logistic regression model based on Gaussian finite mixture model.
(A)The pattern of the logistic regression model is related to the AUC score and is determined by Gaussian mixture (B) The waterfall diagrams of 6 key genes in different genes.

Similar articles

Cited by

References

    1. Millar NL, Silbernagel KG, Thorborg K, Kirwan PD, Galatz LM, Abrams GD, et al.. Tendinopathy. Nature reviews Disease primers. 2021;7(1):1. Epub 2021/01/09. doi: 10.1038/s41572-020-00234-1 - DOI - PubMed
    1. Hopkins C, Fu SC, Chua E, Hu X, Rolf C, Mattila VM, et al.. Critical review on the socio-economic impact of tendinopathy. Asia-Pacific journal of sports medicine, arthroscopy, rehabilitation and technology. 2016;4:9–20. Epub 2016/04/22. doi: 10.1016/j.asmart.2016.01.002 - DOI - PMC - PubMed
    1. Riley G. Chronic tendon pathology: molecular basis and therapeutic implications. Expert reviews in molecular medicine. 2005;7(5):1–25. Epub 2005/03/31. doi: 10.1017/S1462399405008963 - DOI - PubMed
    1. Lui PP, Maffulli N, Rolf C, Smith RK. What are the validated animal models for tendinopathy? Scandinavian journal of medicine & science in sports. 2011;21(1):3–17. Epub 2010/08/03. doi: 10.1111/j.1600-0838.2010.01164.x - DOI - PubMed
    1. Sharma P, Maffulli N. Basic biology of tendon injury and healing. The surgeon: journal of the Royal Colleges of Surgeons of Edinburgh and Ireland. 2005;3(5):309–16. Epub 2005/10/26. doi: 10.1016/s1479-666x(05)80109-x - DOI - PubMed

Publication types

MeSH terms

Substances