. 2010 Nov 10;5(11):e13929.

doi: 10.1371/journal.pone.0013929.

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Sophie van der Sluis¹, Matthijs Verhage, Danielle Posthuma, Conor V Dolan

Affiliations

Affiliation

¹ Functional Genomics Section, Department of Clinical Genetics, Center for Neurogenomics and Cognitive Research, VU University and VU University Medical Center, Amsterdam, The Netherlands. sophie.van.der.sluis@cncr.vu.nl

PMID: 21085666
PMCID: PMC2978099
DOI: 10.1371/journal.pone.0013929

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Sophie van der Sluis et al. PLoS One. 2010.

. 2010 Nov 10;5(11):e13929.

doi: 10.1371/journal.pone.0013929.

Authors

Sophie van der Sluis¹, Matthijs Verhage, Danielle Posthuma, Conor V Dolan

Affiliation

¹ Functional Genomics Section, Department of Clinical Genetics, Center for Neurogenomics and Cognitive Research, VU University and VU University Medical Center, Amsterdam, The Netherlands. sophie.van.der.sluis@cncr.vu.nl

PMID: 21085666
PMCID: PMC2978099
DOI: 10.1371/journal.pone.0013929

Abstract

Background: The variance explained by genetic variants as identified in (genome-wide) genetic association studies is typically small compared to family-based heritability estimates. Explanations of this 'missing heritability' have been mainly genetic, such as genetic heterogeneity and complex (epi-)genetic mechanisms.

Methodology: We used comprehensive simulation studies to show that three phenotypic measurement issues also provide viable explanations of the missing heritability: phenotypic complexity, measurement bias, and phenotypic resolution. We identify the circumstances in which the use of phenotypic sum-scores and the presence of measurement bias lower the power to detect genetic variants. In addition, we show how the differential resolution of psychometric instruments (i.e., whether the instrument includes items that resolve individual differences in the normal range or in the clinical range of a phenotype) affects the power to detect genetic variants.

Conclusion: We conclude that careful phenotypic data modelling can improve the genetic signal, and thus the statistical power to identify genetic variants by 20-99%.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Uni- or multidimensionality in latent factor models.**
Figure 1a shows a graphical representation of a unidimensional factor model: one latent factor affecting the scores on 6 items. The effect of the genetic variant (GV) on the items scores is indirect, running via the latent trait. Often, however, scores on a test are not determined by one, but by multiple latent traits, or sub-dimensions of a latent trait, such as depicted in Figure 1b, where the scores on the first two items are determined by dimension 1, the scores on the last two items by dimension 2, and the scores on the middle items by both dimensions of the latent trait. Genetic association studies are complicated by this multidimensionality, because it is unknown beforehand whether genetic variants affects either or both dimensions.

**Figure 2. Factor models used for simulation in Studies 1 and 2.**
*Study 1:* Data were simulated according to a 2-dimensional (Figure 2a) or 3-dimensional (Figure 2b) latent factor model, with factorial correlations ρ ranging between .2 and .6, and factor loadings fixed to .7. *Study 2:* Data were simulated according to a 1-factor model (Figure 2c), with all items having either a weak or a strong relation to the latent factor (factor loadings of .3 or .7, respectively). The genetic variant (GV) affected the first item only.

**Figure 3. Item characteristic curves in a 2-parameter Item Response Theory (IRT) model.**
Figure 3 shows the item characteristic curves of two items describing the probability of answering the items correctly (affirmatively) given one's latent trait score θ. The first item (left) has difficulty parameter b = −1, i.e., subjects with (standardized) latent trait score equal to ϑ = −1 have 50% probability to endorse this item, while subjects with latent trait score ϑ = 2 endorse this item with 95% probability. The second item (right) has difficulty parameter b = 2, i.e., subjects with latent trait score ϑ = 2 have 50% probability to endorse this item, while subjects with latent trait score ϑ = −1 only have 5% chance. Both items have discrimination parameter a = 1 (i.e., equal slopes), determining the degree to which a given item discriminates between subjects with different latent trait scores. In contrast to items with low discrimination parameters (flat slopes), items with high discrimination parameters (steep slopes) discriminate well between subjects whose latent trait scores lie closely together within a narrow range. The 2-parameter logistic model , can be used to calculate for every subject i the probability of endorsing an item X_j given this item's discrimination parameter *a_j* and difficulty parameter *b_j*.

**Figure 4. The power to detect genetic variants is lower if sum-scores are not sufficient statistics (results Study 1).**
Figures 4a–c show the sample size required for a power of 80% to detect a genetic variant (GV) that explains 1% of the variance on the latent level, using either the sum-score model or the true latent factor model. Figures show the effects of violation of unidimensionality (Figure 4a), violations of equal factor loadings (Figure 4b), and of violations of equal residual variances (Figure 4c).

**Figure 5. The power to detect genetic variants is slightly affected by violations of measurement invariance with respect to sample (results Study 2).**
In the case of continuous items, measurement invariance (MI) with respect to sample holds if 1) the factor structure is identical across samples (‘configural invariance’; Figure 5a), 2) the factor loadings relating the observed items to the latent trait(s) are identical across samples (‘metric invariance’; Figure 5b), 3) mean differences between samples on the individual items are attributable to mean differences at the latent level (‘strong factorial invariance’; Figure 5c), and 4) the variance in item scores not explained by the latent trait(s) is equal across samples (‘strict factorial invariance’; Figure 5d). We simulated these four types of violations of MI, and analyzed the data using either the sum-score model or the true latent factor model. Figures 5a–d show the sample size required for a power of 80% to detect a genetic variant that explains 1% of the variance on the latent level under these four different kinds of violations of MI.

**Figure 6. The power to detect item-specific genetic variants greatly depends on the fitted phenotypic model (results Study 2).**
When the effect of a genetic variants does not run via the latent factor but is directly on, and specific to, one of the items (as illustrated in Figure 2c), we speak of violations of measurement invariance with respect to the genetic variant itself. Figure 6 shows the sample size required for a power of 80% to detect such an item-specific genetic variant that explains 1% of the variance in the first item only.

See this image and copyright information in PMC

Cited by

Grand challenge in behavioral and psychiatric genetics: quantitative challenges to keeping up with molecular advances.
Knopik VS. Knopik VS. Front Genet. 2011 Mar 1;2:9. doi: 10.3389/fgene.2011.00009. eCollection 2011. Front Genet. 2011. PMID: 22303308 Free PMC article. No abstract available.
Alcohol-related genes show an enrichment of associations with a persistent externalizing factor.
Ashenhurst JR, Harden KP, Corbin WR, Fromme K. Ashenhurst JR, et al. J Abnorm Psychol. 2016 Oct;125(7):933-945. doi: 10.1037/abn0000194. Epub 2016 Aug 8. J Abnorm Psychol. 2016. PMID: 27505405 Free PMC article.
Nordic OCD & Related Disorders Consortium: Rationale, design, and methods.
Mataix-Cols D, Hansen B, Mattheisen M, Karlsson EK, Addington AM, Boberg J, Djurfeldt DR, Halvorsen M, Lichtenstein P, Solem S, Lindblad-Toh K; Nordic OCD and Related Disorders Consortium (NORDiC); Haavik J, Kvale G, Rück C, Crowley JJ. Mataix-Cols D, et al. Am J Med Genet B Neuropsychiatr Genet. 2020 Jan;183(1):38-50. doi: 10.1002/ajmg.b.32756. Epub 2019 Aug 19. Am J Med Genet B Neuropsychiatr Genet. 2020. PMID: 31424634 Free PMC article.
Impact of measurement error on testing genetic association with quantitative traits.
Liao J, Li X, Wong TY, Wang JJ, Khor CC, Tai ES, Aung T, Teo YY, Cheng CY. Liao J, et al. PLoS One. 2014 Jan 24;9(1):e87044. doi: 10.1371/journal.pone.0087044. eCollection 2014. PLoS One. 2014. PMID: 24475218 Free PMC article.
Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion.
Vinkhuyzen AA, Pedersen NL, Yang J, Lee SH, Magnusson PK, Iacono WG, McGue M, Madden PA, Heath AC, Luciano M, Payton A, Horan M, Ollier W, Pendleton N, Deary IJ, Montgomery GW, Martin NG, Visscher PM, Wray NR. Vinkhuyzen AA, et al. Transl Psychiatry. 2012 Apr 17;2(4):e102. doi: 10.1038/tp.2012.27. Transl Psychiatry. 2012. PMID: 22832902 Free PMC article.

See all "Cited by" articles

References

1. Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41:703–707. - PMC - PubMed
1. Li J, Coates RJ, Gwinn M, Khoury MJ. Steroid 5-α-Reductase Type 2 (SRD5a2) gene polymorphism and risk of prostate cancer: a HuGE review Human Genome. Epidemiology. 2010;171:1–13. - PubMed
1. Peng B, Cao L, Wang W, Xian L, Jiang D, et al. Polymorphisms in the promotor region of matrix metalloproteinases 1 and 3 cancer risk: a meta-analysis of 50 case-control studies. Mutagenesis. 2010;25:41–48. - PubMed
1. Tian Y, Li Y, Hu Z, Wang D, Sum X, et al. Differential effects of NOD2 polymorphisms on colorectal cancer risk: a meta-analysis. Int J Colorectal Dis. 2010;25:161–168. - PubMed
1. Zhang HF, Qiu LX, Chen Y, Zu WL, Mao C, et al. ATG16L1 T300A polymorphism and Crohn's disease susceptibility: evidence from 13022 cases and 17532 controls. Hum Gen. 2009;125:627–631. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Affiliation

Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources