Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 4;22(1):480.
doi: 10.1186/s12859-021-04395-y.

Spatial rank-based multifactor dimensionality reduction to detect gene-gene interactions for multivariate phenotypes

Affiliations

Spatial rank-based multifactor dimensionality reduction to detect gene-gene interactions for multivariate phenotypes

Mira Park et al. BMC Bioinformatics. .

Abstract

Background: Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene-gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling's T2 statistic to evaluate interaction models, but it is well known that Hotelling's T2 statistic is highly sensitive to heavily skewed distributions and outliers.

Results: We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR .

Conclusions: Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.

Keywords: Fuzzy clustering; Gene–gene interaction; Multifactor dimensionality reduction; Spatial rank statistic.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Examples of spatial signs and ranks for two bivariate distributions
Fig. 2
Fig. 2
Overview of the MR-MDR algorithm for tenfold cross-validation and second-order interactions
Fig. 3
Fig. 3
Hit ratios for a bivariate normal distribution
Fig. 4
Fig. 4
Hit ratios for a bivariate gamma distribution
Fig. 5
Fig. 5
Scatterplot and boxplot of four phenotypes after adjustment by sex, age and recruitment area. The numbers in the scatter plot are correlation coefficients
Fig. 6
Fig. 6
Box plots of four phenotypes after removing the noise cluster for the best SNP combination identified by MR-MDR ((i, j): ith genotype for rs1117105 and jth genotype for rs41476549, s creatinine, ALBU albmin)

Similar articles

Cited by

References

    1. Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, Pallesen J, Agerbo E, Andreassen OA, Anney R, et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet. 2019;51(3):431–444. - PMC - PubMed
    1. McCarthy MI, Zeggini E. Genome-wide association studies in type 2 diabetes. Curr Diab Rep. 2009;9(2):164–171. - PMC - PubMed
    1. Levy D, Ehret GB, Rice K, Verwoert GC, Launer LJ, Dehghan A, Glazer NL, Morrison AC, Johnson AD, Aspelund T, et al. Genome-wide association study of blood pressure and hypertension. Nat Genet. 2009;41(6):677–687. - PMC - PubMed
    1. Gola D, Mahachie John JM, van Steen K, Konig IR. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform. 2016;17(2):293–308. - PMC - PubMed
    1. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. 2001. - PMC - PubMed

LinkOut - more resources