Bioinformatics challenges for genome-wide association studies

Jason H Moore¹, Folkert W Asselbergs, Scott M Williams

Affiliations

PMID: 20053841
PMCID: PMC2820680
DOI: 10.1093/bioinformatics/btp713

Review

Bioinformatics challenges for genome-wide association studies

Jason H Moore et al. Bioinformatics. 2010.

. 2010 Feb 15;26(4):445-55.

doi: 10.1093/bioinformatics/btp713. Epub 2010 Jan 6.

Authors

Jason H Moore¹, Folkert W Asselbergs, Scott M Williams

Affiliation

¹ Department of Genetics, Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH 03756, USA. jason.h.moore@dartmouth.edu

PMID: 20053841
PMCID: PMC2820680
DOI: 10.1093/bioinformatics/btp713

Abstract

The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype-phenotype relationship that is characterized by significant heterogeneity and gene-gene and gene-environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods.

PubMed Disclaimer

Figures

**Fig. 1.**
Overview of the RF algorithm summarized in Section 2.3. Adapted from Reif *et al.* (2006).

**Fig. 2.**
Summary of the constructive induction process for MDR. The left bars within each cell represent the number of cases while the right bars represent the number of controls. Dark-shaded cells are high risk while the light-shaded cells are low risk. Prediction using any classifier can be carried out using the final constructed attribute.

**Fig. 3.**
Summary of how Relief, ReliefF and SURF select neighbors. Each panel in this figure shows the genotypes at two markers for a dataset of cases and controls. For the purpose of this example only these two markers will be considered and both are continuous. When analyzing real data, the process of selecting neighbors is the same, however, but there will be thousands of discrete valued markers (SNPs) each of which would be represented by one of thousands of dimensions. The individual for whom neighbors are being found is shown by the filled red circle. The neighbors that each approach uses for weighting are highlighted in blue. (A–C) Represent how Relief, ReliefF and SURF would select neighbors to be used in weighting. Relief selects the nearest individual of the same dichotomous class (blue circle) and the nearest individual of the other class (blue cross). ReliefF selects some user specified number of individuals (two in this example) to be used for weighting. SURF, instead of using a fixed number of neighbors, uses all individuals within a distance threshold. The dotted line shows a hypothetical distance threshold.

**Fig. 4.**
Flowchart for a simple GP. The goal is to randomly generate an initial population of computer programs or solutions (e.g. genetic models), determine their fitness, select the best models, introduce variability and then iterate until the termination criteria are satisfied. This executes a parallel stochastic search using the principles of evolution by natural selection.

**Fig. 5.**
Flowchart for bioinformatics analyses of GWAS data. The use of filter and wrapper algorithms along with computational modeling approaches is recommended in addition to parametric statistical methods. Biological knowledge in public databases has a very important role to play at all levels of the analysis and interpretation.

See this image and copyright information in PMC

References

1. Ahmed S, et al. Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2. Nat. Genet. 2009;41:585–590. - PMC - PubMed
1. Amundadottir L, et al. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat. Genet. 2009;41:986–990. - PMC - PubMed
1. Amos CI. Successful design and conduct of genome-wide association studies. Hum. Mol. Genet. 2007;16:R220–R225. - PMC - PubMed
1. Andrew AS, et al. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis. 2006;27:1030–1037. - PubMed
1. Askland K, et al. Pathways-based analyses of whole-genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission. Hum. Genet. 2009;125:63–79. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM009012/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bioinformatics challenges for genome-wide association studies

Affiliation

Bioinformatics challenges for genome-wide association studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources