Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 8;24(1):373.
doi: 10.1186/s12870-024-05086-5.

A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering

Affiliations

A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering

Xinghai Zheng et al. BMC Plant Biol. .

Abstract

Background: As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research.

Results: Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress.

Conclusions: To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.

Keywords: Condition-specific gene interactions; Correlation difference value; Gene co-expression network analysis; K-means clustering; Sustained cold stress.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Analysis of metadata for RNA-seq samples of Camellia sinensis. A Cultivar. B Tissue. C Experimental treatments
Fig. 2
Fig. 2
K-means clustering of Camellia sinensis RNA-seq samples and comparative analysis of global vs. cluster-specific co-expression modules. (A) Pie chart showing the proportion of k-means clusters. (B) Scatter plots of t-SNE show the spatial distribution of all Camellia sinensis RNA-seq samples on Component 1 and Component 2. Different clusters are distinguished using different colors, while the same cluster remains consistent across A and B. (C) Similarity analysis of global and cluster-specific co-expression modules. The intensity of colors in the heatmap represents the magnitude of the Fowlkes-Mallows score (FMS). (D) Comparison of the gene-module consistency coefficient (GMC) of all genes between the global module and the cluster-specific module for each cluster
Fig. 3
Fig. 3
Relationship between correlation difference value (CDV) and condition specificity, and average CDV of different biological functions in different clusters. A Illustrative graph demonstrating the change in module similarity as the threshold increases from 0.3 to 0.9. B The impact of genes with different CDV on the similarity of global modules and cluster-specific modules. C CDV heatmap for each bio-function in each cluster. Cells marked with asterisks (*) indicate significant enrichment, and the color of the cells represents the average CDV
Fig. 4
Fig. 4
Correlation difference value (CDV) and functional enrichment heatmap corresponding to various biological functions for each co-expression module in Cluster 2. Cells marked with asterisks (*) indicate significant enrichment, and the color of the cells represents the average CDV. The blank cells in the figure indicate that the co-expressed module does not contain genes in that biological term or only genes with no CDV values
Fig. 5
Fig. 5
Gene regulatory network and comparison analysis of expression profiles. A Gene regulatory network of genes in the “purple” module of Cluster 2. The color intensity of the edges represents the weight between two nodes, and the color variation of the node borders represents the level of correlation difference value (CDV). B Comparison of the expression profile of gene CSS0042951.1 with the expression profiles of the eigengenes of the Cluster 2 module and the Global module. C Comparison of the expression profile of gene CSS0047322.2 with the expression profiles of the eigengenes of the Cluster 2 module and the Global module. D Expression levels of the high CDV transcription factor-encoding genes in the “purple” module of Cluster 2 under sustained low-temperature treatment in the first leaf (FL) and two leaves and a bud (TAB)

References

    1. Wang C, Han J, Pu Y, et al. Tea (Camellia sinensis): a review of nutritional composition, potential applications, and Omics Research. Appl Sci. 2022;12(12):5874. doi: 10.3390/app12125874. - DOI
    1. Chen L, Zhou ZX, Yang YJ. Genetic improvement and breeding of tea plant (Camellia sinensis) in China: from individual selection to hybridization and molecular breeding. Euphytica. 2007;154:239–248. doi: 10.1007/s10681-006-9292-3. - DOI
    1. Chen L, Apostolides Z, Chen ZM, et al. Tea germplasm and breeding in China. In: Chen, Z.M., (Ed.), Global Tea Breeding. Berlin: Springer; 2012. p. 13–58.
    1. Zhao S, Cheng H, Xu P, et al. Regulation of biosynthesis of the main flavor-contributing metabolites in tea plant (Camellia sinensis): a review. Crit Rev Food Sci Nutr. 2023; 63(30):10520–35. - PubMed
    1. Liao Y, Zhou X, Zeng L. How does tea (Camellia sinensis) produce specialized metabolites which determine its unique quality and function: a review. Crit Rev Food Sci Nutr. 2022;62(14):3751–3767. doi: 10.1080/10408398.2020.1868970. - DOI - PubMed