Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 10;5(11):e13929.
doi: 10.1371/journal.pone.0013929.

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Affiliations

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Sophie van der Sluis et al. PLoS One. .

Abstract

Background: The variance explained by genetic variants as identified in (genome-wide) genetic association studies is typically small compared to family-based heritability estimates. Explanations of this 'missing heritability' have been mainly genetic, such as genetic heterogeneity and complex (epi-)genetic mechanisms.

Methodology: We used comprehensive simulation studies to show that three phenotypic measurement issues also provide viable explanations of the missing heritability: phenotypic complexity, measurement bias, and phenotypic resolution. We identify the circumstances in which the use of phenotypic sum-scores and the presence of measurement bias lower the power to detect genetic variants. In addition, we show how the differential resolution of psychometric instruments (i.e., whether the instrument includes items that resolve individual differences in the normal range or in the clinical range of a phenotype) affects the power to detect genetic variants.

Conclusion: We conclude that careful phenotypic data modelling can improve the genetic signal, and thus the statistical power to identify genetic variants by 20-99%.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Uni- or multidimensionality in latent factor models.
Figure 1a shows a graphical representation of a unidimensional factor model: one latent factor affecting the scores on 6 items. The effect of the genetic variant (GV) on the items scores is indirect, running via the latent trait. Often, however, scores on a test are not determined by one, but by multiple latent traits, or sub-dimensions of a latent trait, such as depicted in Figure 1b, where the scores on the first two items are determined by dimension 1, the scores on the last two items by dimension 2, and the scores on the middle items by both dimensions of the latent trait. Genetic association studies are complicated by this multidimensionality, because it is unknown beforehand whether genetic variants affects either or both dimensions.
Figure 2
Figure 2. Factor models used for simulation in Studies 1 and 2.
Study 1: Data were simulated according to a 2-dimensional (Figure 2a) or 3-dimensional (Figure 2b) latent factor model, with factorial correlations ρ ranging between .2 and .6, and factor loadings fixed to .7. Study 2: Data were simulated according to a 1-factor model (Figure 2c), with all items having either a weak or a strong relation to the latent factor (factor loadings of .3 or .7, respectively). The genetic variant (GV) affected the first item only.
Figure 3
Figure 3. Item characteristic curves in a 2-parameter Item Response Theory (IRT) model.
Figure 3 shows the item characteristic curves of two items describing the probability of answering the items correctly (affirmatively) given one's latent trait score θ. The first item (left) has difficulty parameter b = −1, i.e., subjects with (standardized) latent trait score equal to ϑ = −1 have 50% probability to endorse this item, while subjects with latent trait score ϑ = 2 endorse this item with 95% probability. The second item (right) has difficulty parameter b = 2, i.e., subjects with latent trait score ϑ = 2 have 50% probability to endorse this item, while subjects with latent trait score ϑ = −1 only have 5% chance. Both items have discrimination parameter a = 1 (i.e., equal slopes), determining the degree to which a given item discriminates between subjects with different latent trait scores. In contrast to items with low discrimination parameters (flat slopes), items with high discrimination parameters (steep slopes) discriminate well between subjects whose latent trait scores lie closely together within a narrow range. The 2-parameter logistic model , can be used to calculate for every subject i the probability of endorsing an item Xj given this item's discrimination parameter aj and difficulty parameter bj.
Figure 4
Figure 4. The power to detect genetic variants is lower if sum-scores are not sufficient statistics (results Study 1).
Figures 4a–c show the sample size required for a power of 80% to detect a genetic variant (GV) that explains 1% of the variance on the latent level, using either the sum-score model or the true latent factor model. Figures show the effects of violation of unidimensionality (Figure 4a), violations of equal factor loadings (Figure 4b), and of violations of equal residual variances (Figure 4c).
Figure 5
Figure 5. The power to detect genetic variants is slightly affected by violations of measurement invariance with respect to sample (results Study 2).
In the case of continuous items, measurement invariance (MI) with respect to sample holds if 1) the factor structure is identical across samples (‘configural invariance’; Figure 5a), 2) the factor loadings relating the observed items to the latent trait(s) are identical across samples (‘metric invariance’; Figure 5b), 3) mean differences between samples on the individual items are attributable to mean differences at the latent level (‘strong factorial invariance’; Figure 5c), and 4) the variance in item scores not explained by the latent trait(s) is equal across samples (‘strict factorial invariance’; Figure 5d). We simulated these four types of violations of MI, and analyzed the data using either the sum-score model or the true latent factor model. Figures 5a–d show the sample size required for a power of 80% to detect a genetic variant that explains 1% of the variance on the latent level under these four different kinds of violations of MI.
Figure 6
Figure 6. The power to detect item-specific genetic variants greatly depends on the fitted phenotypic model (results Study 2).
When the effect of a genetic variants does not run via the latent factor but is directly on, and specific to, one of the items (as illustrated in Figure 2c), we speak of violations of measurement invariance with respect to the genetic variant itself. Figure 6 shows the sample size required for a power of 80% to detect such an item-specific genetic variant that explains 1% of the variance in the first item only.

Similar articles

Cited by

References

    1. Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41:703–707. - PMC - PubMed
    1. Li J, Coates RJ, Gwinn M, Khoury MJ. Steroid 5-α-Reductase Type 2 (SRD5a2) gene polymorphism and risk of prostate cancer: a HuGE review Human Genome. Epidemiology. 2010;171:1–13. - PubMed
    1. Peng B, Cao L, Wang W, Xian L, Jiang D, et al. Polymorphisms in the promotor region of matrix metalloproteinases 1 and 3 cancer risk: a meta-analysis of 50 case-control studies. Mutagenesis. 2010;25:41–48. - PubMed
    1. Tian Y, Li Y, Hu Z, Wang D, Sum X, et al. Differential effects of NOD2 polymorphisms on colorectal cancer risk: a meta-analysis. Int J Colorectal Dis. 2010;25:161–168. - PubMed
    1. Zhang HF, Qiu LX, Chen Y, Zu WL, Mao C, et al. ATG16L1 T300A polymorphism and Crohn's disease susceptibility: evidence from 13022 cases and 17532 controls. Hum Gen. 2009;125:627–631. - PubMed

Publication types