Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 5;12(1):e0169605.
doi: 10.1371/journal.pone.0169605. eCollection 2017.

Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.)

Affiliations

Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.)

Samarendra Das et al. PLoS One. .

Abstract

Selection of informative genes is an important problem in gene expression studies. The small sample size and the large number of genes in gene expression data make the selection process complex. Further, the selected informative genes may act as a vital input for gene co-expression network analysis. Moreover, the identification of hub genes and module interactions in gene co-expression networks is yet to be fully explored. This paper presents a statistically sound gene selection technique based on support vector machine algorithm for selecting informative genes from high dimensional gene expression data. Also, an attempt has been made to develop a statistical approach for identification of hub genes in the gene co-expression network. Besides, a differential hub gene analysis approach has also been developed to group the identified hub genes into various groups based on their gene connectivity in a case vs. control study. Based on this proposed approach, an R package, i.e., dhga (https://cran.r-project.org/web/packages/dhga) has been developed. The comparative performance of the proposed gene selection technique as well as hub gene identification approach was evaluated on three different crop microarray datasets. The proposed gene selection technique outperformed most of the existing techniques for selecting robust set of informative genes. Based on the proposed hub gene identification approach, a few number of hub genes were identified as compared to the existing approach, which is in accordance with the principle of scale free property of real networks. In this study, some key genes along with their Arabidopsis orthologs has been reported, which can be used for Aluminum toxic stress response engineering in soybean. The functional analysis of various selected key genes revealed the underlying molecular mechanisms of Aluminum toxic stress response in soybean.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Gene selection plot for selection of informative genes for Al stress in soybean.
The horizontal axis represents negative logarithm of statistical significance values obtained from Boot-SVM-RFE. The vertical axis shows the negative logarithm of statistical significance values from t-test. Green dots indicate selected probes with–log (p-value) from Boot-SVM-RFE ≥ threshold of 2.5 and t-test–log (p-value) ≥ threshold of 4. Red stars indicate the selected probes which have Arabidopsis orthologs. Blue dots indicate unselected probes.
Fig 2
Fig 2. Functional enrichment analysis of selected genes and hub genes under Al stress.
The GO term enrichment analysis of 981 selected informative genes (A) and hub genes (B) for Al stress condition using Agrigo is shown for different gene ontology categories (CC, MF and BP). For (A), the GO terms are chosen whose p-values < 0.008 and FDR values (false discovery rate) < 0.6. For (B), the GO terms are chosen whose p-values < 0.1 and FDR values < 0.8.
Fig 3
Fig 3. Clustering dendrogram of selected genes and gene modules under Al stress and control condition.
The correspondence between Consensus Modules (CM) with modules under Stress (SM) (A) and control (NM) (B) conditions is represented.
Fig 4
Fig 4. Module interaction network for gene modules under Al stress.
The network consists of 19 nodes and 70 edges (regulatory relations). To remove the weak interaction among the modules, a threshold value for posterior probability is fixed at 0.2.
Fig 5
Fig 5. Distribution of WGS in complete networks under stress and control conditions.
The distributions of WGS of genes in GCNs for Al stress (A) and control (B) conditions in soybean are shown. The distributions of WGS of genes in GCNs for salinity stress (C) and control (D) conditions in rice are shown. For all these cases, the distributions are heavy tailed.
Fig 6
Fig 6. Distribution of p-values under stress and control conditions.
The distributions of p-values of genes in GCNs for Al stress (A) and control (B) conditions in soybean are shown. The distributions of p-values of genes in GCNs for salinity stress (C) and control (D) conditions in rice are shown. Genes with low p-values represent highly interacting genes in the GCN.
Fig 7
Fig 7. Gene Co-expression Networks for two differential conditions in soybean.
The GCNs are constructed for Al stress (A) and control (B) conditions respectively. The nodes with red colors represent the housekeeping hub genes, green color nodes represent UHG and blue color nodes represent the non-hub genes. (C) Venn diagram of hub genes in the GCNs constructed under Al stress (A) and control (B) conditions in soybean. The number of orthologous genes found in Arabidopsis corresponding to unique and common hub genes in soybean is also shown.

References

    1. Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP. Computational solutions to large-scale data management and analysis. Nat Rev Genet. 2009; 11(9): 647–657. - PMC - PubMed
    1. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big Data: Astronomical or Genomical? PLoS Biol. 2015; 13(7): e1002195 10.1371/journal.pbio.1002195 - DOI - PMC - PubMed
    1. Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 2012; 40: 3785–3799. 10.1093/nar/gkr1265 - DOI - PMC - PubMed
    1. Rodrigo G, Carrera J, Ruiz-Ferrer V, del Toro FJ, Llave C, Voinnet O, et al. A meta-analysis reveals the commonalities and differences in Arabidopsis thaliana response to different viral pathogens. PLoS ONE. 2012; 7: e40526 10.1371/journal.pone.0040526 - DOI - PMC - PubMed
    1. Shaik R, Ramakrishna W. Genes and co-expression modules common to drought and bacterial stress responses in Arabidopsis and Rice. PLoS ONE. 2013; 8(10): e77261 10.1371/journal.pone.0077261 - DOI - PMC - PubMed

MeSH terms