Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 29;11(1):50.
doi: 10.1186/s13062-016-0152-3.

Weighted-SAMGSR: combining significance analysis of microarray-gene set reduction algorithm with pathway topology-based weights to select relevant genes

Affiliations

Weighted-SAMGSR: combining significance analysis of microarray-gene set reduction algorithm with pathway topology-based weights to select relevant genes

Suyan Tian et al. Biol Direct. .

Abstract

Background: It has been demonstrated that a pathway-based feature selection method that incorporates biological information within pathways during the process of feature selection usually outperforms a gene-based feature selection algorithm in terms of predictive accuracy and stability. Significance analysis of microarray-gene set reduction algorithm (SAMGSR), an extension to a gene set analysis method with further reduction of the selected pathways to their respective core subsets, can be regarded as a pathway-based feature selection method.

Methods: In SAMGSR, whether a gene is selected is mainly determined by its expression difference between the phenotypes, and partially by the number of pathways to which this gene belongs. It ignores the topology information among pathways. In this study, we propose a weighted version of the SAMGSR algorithm by constructing weights based on the connectivity among genes and then combing these weights with the test statistics.

Results: Using both simulated and real-world data, we evaluate the performance of the proposed SAMGSR extension and demonstrate that the weighted version outperforms its original version. CONCLUSIONS: To conclude, the additional gene connectivity information does faciliatate feature selection.

Reviewers: This article was reviewed by Drs. Limsoon Wong, Lev Klebanov, and, I. King Jordan.

Keywords: Multiple sclerosis (MS); Non-small cell lung cancer (NSCLC); Pathway knowledge; Pathway-based feature selection; Significance analysis of microarray (SAM); Weighted gene expression profiles.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Diagrams to elucidate both SAMGSR and weighted-SAMGSR algorithms
Fig. 2
Fig. 2
Scatterplot to show the correlation between the number of gene sets one gene is involved and its connectivity. ρ is the estimated Spearman correlation coefficient between the number of gene sets involved and (1 + the number of connected genes)

References

    1. Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008;24:1175–82. doi: 10.1093/bioinformatics/btn081. - DOI - PubMed
    1. Kim S, Pan W, Shen X. Network-based penalized regression with application to genomic data. Biometrics. 2013;69:582–93. doi: 10.1111/biom.12035. - DOI - PMC - PubMed
    1. Pan W, Xie B, Shen X. Incorporating predictor network in penalized regression with application to microarray data. Biometrics. 2010;66:474–84. doi: 10.1111/j.1541-0420.2009.01296.x. - DOI - PMC - PubMed
    1. Ma S, Shi M, Li Y, Yi D, Shia B-C. Incorporating gene co-expression network in identification of cancer prognosis markers. BMC Bioinformatics. 2010;11:271. doi: 10.1186/1471-2105-11-271. - DOI - PMC - PubMed
    1. Ma S, Song X, Huang J. Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics. 2007;8:60. doi: 10.1186/1471-2105-8-60. - DOI - PMC - PubMed

LinkOut - more resources