Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2006 Mar;2(5):318-28.
doi: 10.1186/1479-7364-2-5-318.

Multifactor dimensionality reduction: an analysis strategy for modelling and detecting gene-gene interactions in human genetics and pharmacogenomics studies

Affiliations
Review

Multifactor dimensionality reduction: an analysis strategy for modelling and detecting gene-gene interactions in human genetics and pharmacogenomics studies

Alison A Motsinger et al. Hum Genomics. 2006 Mar.

Abstract

The detection of gene-gene and gene-environment interactions associated with complex human disease or pharmacogenomic endpoints is a difficult challenge for human geneticists. Unlike rare, Mendelian diseases that are associated with a single gene, most common diseases are caused by the non-linear interaction of numerous genetic and environmental variables. The dimensionality involved in the evaluation of combinations of many such variables quickly diminishes the usefulness of traditional, parametric statistical methods. Multifactor dimensionality reduction (MDR) is a novel and powerful statistical tool for detecting and modelling epistasis. MDR is a non-parametric and model-free approach that has been shown to have reasonable power to detect epistasis in both theoretical and empirical studies. MDR has detected interactions in diseases such as sporadic breast cancer, multiple sclerosis and essential hypertension. As this method is more frequently applied, and was gained acceptance in the study of human disease and pharmacogenomics, it is becoming increasingly important that the implementation of the MDR approach is properly understood. As with all statistical methods, MDR is only powerful and useful when implemented correctly. Concerns regarding dataset structure, configuration parameters and the proper execution of permutation testing in reference to a particular dataset and configuration are essential to the method's effectiveness. The detection, characterisation and interpretation of gene-gene and gene-environment interactions are expected to improve the diagnosis, prevention and treatment of common human diseases. MDR can be a powerful tool in reaching these goals when used appropriately.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of the general steps to implement the MDR method (adapted from Ritchie et al. [9]) In step one, the data are divided into a training set and an independent testing set for cross-validation. In step two, a set of n factors is then selected from the pool of all factors. In step three, the n factors and their possible multifactor cells are represented in n-dimensional space. In step four, each multifactor cell in the n-dimensional space is labelled as high risk if the ratio of affected individuals to unaffected individuals exceeds a threshold of one, and low risk if the threshold is not exceeded. In steps five and six, the model with the best misclassification error is selected and the prediction error of the model is estimated using the independent test data. Steps one through to six are repeated for each possible cross-validation interval. Bars represent hypothetical distributions of cases (left) and controls (right) with each multifactor combination. Dark-shaded cells represent high-risk genotype combinations, whereas light-shaded cells represent low-risk genotype combinations. White cells represent genotype combinations for which no data were observed.
Figure 2
Figure 2
Example of trend of classification error(2A) and prediction error(2B) when the number of loci in a model increases. The classification error continues to get smaller and smaller, which indicates over-fitting. The prediction error will average around 50 per cent and will drop for the best model.
Figure 3
Figure 3
Flow chart of multifactor dimensionality reduction (MDR) procedure. The flow chart outlines the thought process that must be completed for each data analysis with MDR. The steps in analysis vary with different dataset structures and characteristics, and the flowchart guides the user through the decision-making process associated with any particular analysis.
Figure 4
Figure 4
Flow chart of permutation testing procedure. The flow chart carries the user through the process of permutation testing step by step, for any type of dataset analysis.

References

    1. Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered. 2003;56:73–82. doi: 10.1159/000073735. - DOI - PubMed
    1. Sing CF, Stengard JH, Kardia SL. Dynamic relationships between the genome and exposures to environments as causes of common human diseases. World Rev Nutr Diet. 2004;93:77–91. - PubMed
    1. Thornton-Wells TA, Moore JH, Haines JL. Genetics, statistics and human disease: Analytical retooling for complexity. Trends Genet. 2004;20:640–647. doi: 10.1016/j.tig.2004.09.007. - DOI - PubMed
    1. Wilke RA, Reif DM, Moore JH. Combinatorial pharmacogenetics. Nat Rev Drug Discov. 2005;4:911–918. doi: 10.1038/nrd1874. - DOI - PubMed
    1. Bellman R. Adaptive Control Processes. Princeton University Press, Princeton, WJ; 1961.