Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 21;19(1):335.
doi: 10.1186/s12859-018-2372-2.

Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait

Affiliations

Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait

Jie-Huei Wang et al. BMC Bioinformatics. .

Abstract

Background: The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients' survival prediction based on gene expression data.

Results: Simulation studies demonstrate that the performance of the OGS approach in identifying the true main and interaction effects is good and the survival prediction accuracy of OGS with the Lasso penalty is better than the ordinary Lasso method. In real data analysis, we identify several significant genes and/or epistasis interactions that are associated with clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) and non-small-cell lung cancer (NSCLC) by utilizing prior pathway information from the KEGG pathway and the GO biological process databases, respectively.

Conclusions: The OGS approach is useful for selecting important genes and epistasis interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The OGS approach is generally applicable to various types of outcome data (quantitative, qualitative, censored event time data) and regression models (e.g. linear, logistic, and Cox's regression models).

Keywords: Gene-gene interaction; Lasso; Overlapping group; Survival prediction.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The natural hierarchal structure of genes related to pathways with the clinical outcome
Fig. 2
Fig. 2
The gene indices of the pathways considered in Simulation 1
Fig. 3
Fig. 3
The gene indices of the pathways considered in Simulation 2
Fig. 4
Fig. 4
Kaplan-Meier curves for the 207 subjects in the DLBCL with the testing data. Good (blue) and poor (red) groups are identified by the median of the PI’s in the testing dataset
Fig. 5
Fig. 5
Kaplan-Meier curves for the 62 subjects in the NSCLC testing data. Good (blue), medium (red) and poor (green) groups are identified by the tertile of the PI’s in the test dataset
Fig. 6
Fig. 6
Kaplan-Meier curves for the 62 subjects in the NSCLC testing data. Good (blue) and poor (red) groups are identified by the median of the PI’s in the test dataset

References

    1. Huang YT, VanderWeele TJ, Lin X. Joint analysis of snp and gene expression data in genetic association studies of complex diseases. Ann Appl Stat. 2014;8(1):352–376. doi: 10.1214/13-AOAS690. - DOI - PMC - PubMed
    1. Fang YH, Wang JH, Hsiung CA. TSGSIS: a high-dimensional grouped variable selection approach for detection of whole-genome SNP–SNP interactions. Bioinformatics. 2017;33(22):3595–3602. doi: 10.1093/bioinformatics/btx409. - DOI - PubMed
    1. Fang YH, Chiu YF. SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interaction in family studies. Genet Epidemiol. 2012;36(2):88–98. doi: 10.1002/gepi.21602. - DOI - PubMed
    1. Li J, Zhong W, Li R, Wu R. A fast algorithm for detecting gene-gene interactions in genome-wide association studies. Appl Stat. 2014;8(4):2292–2318. doi: 10.1214/14-AOAS771. - DOI - PMC - PubMed
    1. Jacob L, Obozinski G, Vert JP. Proceedings of the 26th annual international conference on machine learning. Montreal: ACM; 2009. Group lasso with overlap and graph lasso; pp. 433–440.