Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 27:14:1092143.
doi: 10.3389/fmicb.2023.1092143. eCollection 2023.

Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods

Affiliations

Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods

Xuan Jia et al. Front Microbiol. .

Abstract

Male infertility has always been one of the important factors affecting the infertility of couples of gestational age. The reasons that affect male infertility includes living habits, hereditary factors, etc. Identifying the genetic causes of male infertility can help us understand the biology of male infertility, as well as the diagnosis of genetic testing and the determination of clinical treatment options. While current research has made significant progress in the genes that cause sperm defects in men, genetic studies of sperm content defects are still lacking. This article is based on a dataset of gene expression data on the X chromosome in patients with azoospermia, mild and severe oligospermia. Due to the difference in the degree of disease between patients and the possible difference in genetic causes, common classical clustering methods such as k-means, hierarchical clustering, etc. cannot effectively identify samples (realize simultaneous clustering of samples and features). In this paper, we use machine learning and various statistical methods such as hypergeometric distribution, Gibbs sampling, Fisher test, etc. and genes the interaction network for cluster analysis of gene expression data of male infertility patients has certain advantages compared with existing methods. The cluster results were identified by differential co-expression analysis of gene expression data in male infertility patients, and the model recognition clusters were analyzed by multiple gene enrichment methods, showing different degrees of enrichment in various enzyme activities, cancer, virus-related, ATP and ADP production, and other pathways. At the same time, as this paper is an unsupervised analysis of genetic factors of male infertility patients, we constructed a simulated data set, in which the clustering results have been determined, which can be used to measure the effect of discriminant model recognition. Through comparison, it finds that the proposed model has a better identification effect.

Keywords: Fisher test; Gibbs sampling; HPV; gene interaction network; hypergeometric distribution; machine learning; male infertility.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Interaction network of some genes in GDS37948.
Figure 2
Figure 2
Introduction to the model process.
Figure 3
Figure 3
Enrichment circle plot of genes in clusters identified by our method in the male infertility data set. The cluster is the id in Table 1: 1. (Visualization of the relationship between genes and enrichment pathways).
Figure 4
Figure 4
The data are divided according to the difference in the number of genes in the clusters in the simulated data set. The clustering effect is measured according to the jaccard similarity coefficient, and compared with other methods. COEXSML is the method proposed in this paper.

Similar articles

Cited by

References

    1. Alzahrani M., Kuwahara H., Wang W., Gao X. (2017). Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data. Bioinformatics 33, 2523–2531. doi: 10.1093/bioinformatics/btx199, PMID: - DOI - PMC - PubMed
    1. Antonuccio P., Micali A. G., Romeo C., Freni J., Vermiglio G., Puzzolo D., et al. (2021). NLRP3 inflammasome: a new pharmacological target for reducing testicular damage associated with varicocele. Int. J. Mol. Sci. 22. doi: 10.3390/ijms22031319, PMID: - DOI - PMC - PubMed
    1. Aquila S., Sisci D., Gentile M., Middea E., Catalano S., Carpino A., et al. (2004). Estrogen receptor (ER) alpha and ER beta are both expressed in human ejaculated spermatozoa: evidence of their direct interaction with phosphatidylinositol-3-OH kinase/Akt pathway. J. Clin. Endocrinol. Metab. 89, 1443–1451. doi: 10.1210/jc.2003-031681, PMID: - DOI - PubMed
    1. Bergmann S., Ihmels J., Barkai N. (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 67:031902. doi: 10.1103/PhysRevE.67.031902, PMID: - DOI - PubMed
    1. Bollobás B., Borgs C., Chayes J. (2003). “Directed scale-free graphs,’’ in Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms. (Philadelphia, PA, USA). 132–139.

LinkOut - more resources