Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 2;4(12):e7969.
doi: 10.1371/journal.pone.0007969.

How many genetic variants remain to be discovered?

Affiliations

How many genetic variants remain to be discovered?

Yudi Pawitan et al. PLoS One. .

Abstract

A great majority of genetic markers discovered in recent genome-wide association studies have small effect sizes, and they explain only a small fraction of the genetic contribution to the diseases. How many more variants can we expect to discover and what study sizes are needed? We derive the connection between the cumulative risk of the SNP variants to the latent genetic risk model and heritability of the disease. We determine the sample size required for case-control studies in order to achieve a certain expected number of discoveries in a collection of most significant SNPs. Assuming similar allele frequencies and effect sizes of the currently validated SNPs, complex phenotypes such as type-2 diabetes would need approximately 800 variants to explain its 40% heritability. Much smaller numbers of variants are needed if we assume rare-variants but higher penetrance models. We estimate that up to 50,000 cases and an equal number of controls are needed to discover 800 common low-penetrant variants among the top 5000 SNPs. Under common and rare low-penetrance models, the very large studies required to discover the numerous variants are probably at the limit of practical feasibility. Under rare-variant with medium- to high-penetrance models (odds-ratios between 1.6 and 4.0), studies comparable in size to many existing studies are adequate provided the genotyping technology can interrogate more and rarer variants.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of 383 ORs from 101 GWA studies listed in the Supplementary table (Table S1).
Figure 2
Figure 2. Distribution of latent genetic risk derived for the type-2 diabetes example, computed using (1) and (2).
Figure 3
Figure 3. The number of variants required to explain the corresponding heritability.
The labels A–F refer to the genetic models given in Table 2.
Figure 4
Figure 4. The expected number of discoveries of causal variants as a function of the number of cases in a case-control study, with equal number of controls.
The models refer to those in Table 2 in terms of the range of MAFs and ORs of the risk alleles of non-null variants. For models A and B, we plot the expected number of discoveries among the top 1000 (solid), 2000 (dashed) and 5000 SNPs (dotted); for models D and E, they among the top 100 (solid), 200 (dashed) and 500 SNPs (dotted).

References

    1. Diabetes Genetics Initiative. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336. - PubMed
    1. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40(5):638–645. - PMC - PubMed
    1. Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447(7148):1087–1093. - PMC - PubMed
    1. Wray NR, Goddard ME, Visscher PM. Prediction of individual genetic risk of complex disease. Curr Opin Genet Dev. 2008;18(3):257–263. - PubMed
    1. Iles MM. What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet. 2008;4(2):e33. - PMC - PubMed

Publication types