Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;17(10):1520-8.
doi: 10.1101/gr.6665407. Epub 2007 Sep 4.

Prediction of individual genetic risk to disease from genome-wide association studies

Affiliations

Prediction of individual genetic risk to disease from genome-wide association studies

Naomi R Wray et al. Genome Res. 2007 Oct.

Abstract

Empirical studies suggest that the effect sizes of individual causal risk alleles underlying complex genetic diseases are small, with most genotype relative risks in the range of 1.1-2.0. Although the increased risk of disease for a carrier is small for any single locus, knowledge of multiple-risk alleles throughout the genome could allow the identification of individuals that are at high risk. In this study, we investigate the number and effect size of risk loci that underlie complex disease constrained by the disease parameters of prevalence and heritability. Then we quantify the value of prediction of genetic risk to disease using a range of realistic combinations of the number, size, and distribution of risk effects that underlie complex diseases. We propose an approach to assess the genetic risk of a disease in healthy individuals, based on dense genome-wide SNP panels. We test this approach using simulation. When the number of loci contributing to the disease is >50, a large case-control study is needed to identify a set of risk loci for use in predicting the disease risk of healthy people not included in the case-control study. For diseases controlled by 1000 loci of mean relative risk of only 1.04, a case-control study with 10,000 cases and controls can lead to selection of approximately 75 loci that explain >50% of the genetic variance. The 5% of people with the highest predicted risk are three to seven times more likely to suffer the disease than the population average, depending on heritability and disease prevalence. Whether an individual with known genetic risk develops the disease depends on known and unknown environmental factors.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of allele frequencies under the neutral and common-disease common-variant (CDCV) models from 10,000 simulated loci.
Figure 2.
Figure 2.
Relationship between the number of susceptibility or risk loci and their average relative risk (RR) for common disease; K is the population prevalence of the disease; h2 is the heritability on the observed scale; λs is the RR for full-siblings based on the heritability and prevalence parameters. Distribution of effects of risk loci under neutral (A) and CDCV (B) models. The mean RR are the mean of 10,000 simulated samples.
Figure 3.
Figure 3.
Relationship between disease prevalence (K) and heritability (h2) on number of risk loci contributing to a disease, assuming a fixed frequency of risk alleles (p) and fixed RR of 1.1 (Equation 3). Based on results from Figure 1, p = 0.1 approximates to the neutral model and p = 0.5 approximates to the CDCV model.
Figure 4.
Figure 4.
Accuracy of risk prediction of disease risk in a population sample using a set of predictive SNPs selected after a genome-wide association study of N each of cases and controls. A CDCV disease model is assumed with population prevalence (K) and heritability (h2) of the disease. Results for the neutral model were similar. Mean of 100 simulation replicates. The legend lists the data series in their order at 1000 risk loci.
Figure 5.
Figure 5.
Relative risk of disease for the estimated top 5% of individuals at risk of a new sample of 1000 people following a case-control study with sample size of N each of cases and controls. A CDCV disease model is assumed with population prevalence (K) and heritability (h2) of the disease. Results for the neutral model were similar. Mean of 100 simulation replicates. The legend lists the data series in their order at 1000 risk loci.

Similar articles

Cited by

References

    1. Barrett J.C., Cardon L.R., Cardon L.R. Evaluating coverage of genome-wide association studies. Nat. Genet. 2006;38:659–662. - PubMed
    1. Barton N.H., Keightley P.D., Keightley P.D. Understanding quantitative genetic variation. Nat. Rev. Genet. 2002;3:11–21. - PubMed
    1. Bell J. Predicting disease using genomics. Nature. 2004;429:453–456. - PubMed
    1. Bertram L., McQueen M.B., Mullin K., Blacker D., Tanzi R.E., McQueen M.B., Mullin K., Blacker D., Tanzi R.E., Mullin K., Blacker D., Tanzi R.E., Blacker D., Tanzi R.E., Tanzi R.E. Systematic meta-analyses of Alzheimer disease genetic association studies: The AlzGene database. Nat. Genet. 2007;39:17–23. - PubMed
    1. Carlson C.S., Eberle M.A., Kruglyak L., Nickerson D.A., Eberle M.A., Kruglyak L., Nickerson D.A., Kruglyak L., Nickerson D.A., Nickerson D.A. Mapping complex disease loci in whole-genome association studies. Nature. 2004;429:446–452. - PubMed

Publication types