Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12;12(1):87.
doi: 10.3390/genes12010087.

K-Module Algorithm: An Additional Step to Improve the Clustering Results of WGCNA Co-Expression Networks

Affiliations

K-Module Algorithm: An Additional Step to Improve the Clustering Results of WGCNA Co-Expression Networks

Jie Hou et al. Genes (Basel). .

Abstract

Among biological networks, co-expression networks have been widely studied. One of the most commonly used pipelines for the construction of co-expression networks is weighted gene co-expression network analysis (WGCNA), which can identify highly co-expressed clusters of genes (modules). WGCNA identifies gene modules using hierarchical clustering. The major drawback of hierarchical clustering is that once two objects are clustered together, it cannot be reversed; thus, re-adjustment of the unbefitting decision is impossible. In this paper, we calculate the similarity matrix with the distance correlation for WGCNA to construct a gene co-expression network, and present a new approach called the k-module algorithm to improve the WGCNA clustering results. This method can assign all genes to the module with the highest mean connectivity with these genes. This algorithm re-adjusts the results of hierarchical clustering while retaining the advantages of the dynamic tree cut method. The validity of the algorithm is verified using six datasets from microarray and RNA-seq data. The k-module algorithm has fewer iterations, which leads to lower complexity. We verify that the gene modules obtained by the k-module algorithm have high enrichment scores and strong stability. Our method improves upon hierarchical clustering, and can be applied to general clustering algorithms based on the similarity matrix, not limited to gene co-expression network analysis.

Keywords: connectivity; distance correlation; enrichment analysis; gene co-expression networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Weighted gene co-expression network analysis (WGCNA) and k-module algorithm flow chart.
Figure 2
Figure 2
Silhouette coefficient score (a) and Dunn validity index (b) of WGCNA, k-eigengene, and k-module algorithms. The evaluation value obtained by the k-module algorithm was the highest in most of the datasets.
Figure 3
Figure 3
Average Database for Annotation, Visualization, and Integrated Discovery (DAVID) enrichment score of modules obtained by WGCNA, k-eigengene, and k-module algorithms. The enrichment score obtained by the k-module algorithm was the highest in most of the datasets.
Figure 4
Figure 4
The average proportion of variance captured by eigengenes obtained by the k-eigengene algorithm. The proportion of variance in the pancreatic cancer and Arabidopsis datasets was the highest.
Figure 5
Figure 5
Module preservation between even partitioning of the liver dataset for WGCNA (a), k-eigengene (b) and k-module (c). All cells with a color depth below log10(0.05) are shown as light blue, while the other cells maintain a color gradient from white to red. The modules with preservation significance greater than 50 have the numbers printed in bold and italic. The k-module algorithm has reasonable module preservation statistics.

Similar articles

Cited by

References

    1. Zhang B., Horvath S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 2005;4 doi: 10.2202/1544-6115.1128. - DOI - PubMed
    1. Fiscon G., Conte F., Farina L., Paci P. Network-based approaches to explore complex biological systems towards network medicine. Genes. 2018;9:437. doi: 10.3390/genes9090437. - DOI - PMC - PubMed
    1. Zhang J., Huang K. Normalized imqcm: An algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform. 2014;13 doi: 10.4137/CIN.S14021. - DOI - PMC - PubMed
    1. Gonzalez-Dominguez J., Martin M.J. MPIGeneNet: Parallel calculation of gene co-expression networks on multicore clusters. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017;15:1732–1737. doi: 10.1109/TCBB.2017.2761340. - DOI - PubMed
    1. Yang R., Du Y., Wang L., Chen Z., Liu X. Weighted gene co-expression network analysis identifies CCNA2 as a treatment target of prostate cancer through inhibiting cell cycle. J. Cancer. 2020;11:1203–1211. doi: 10.7150/jca.38173. - DOI - PMC - PubMed

Publication types

LinkOut - more resources