Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan;49(1):99-111.
doi: 10.1007/s10519-018-9942-y. Epub 2018 Dec 20.

Type I Error Rates and Parameter Bias in Multivariate Behavioral Genetic Models

Affiliations

Type I Error Rates and Parameter Bias in Multivariate Behavioral Genetic Models

Brad Verhulst et al. Behav Genet. 2019 Jan.

Abstract

For many multivariate twin models, the numerical Type I error rates are lower than theoretically expected rates using a likelihood ratio test (LRT), which implies that the significance threshold for statistical hypothesis tests is more conservative than most twin researchers realize. This makes the numerical Type II error rates higher than theoretically expected. Furthermore, the discrepancy between the observed and expected error rates increases as more variables are included in the analysis and can have profound implications for hypothesis testing and statistical inference. In two simulation studies, we examine the Type I error rates for the Cholesky decomposition and Correlated Factors models. Both show markedly lower than nominal Type I error rates under the null hypothesis, a discrepancy that increases with the number of variables in the model. In addition, we observe slightly biased parameter estimates for the Cholesky decomposition and Correlated Factors models. By contrast, if the variance-covariance matrices for variance components are estimated directly (without constraints), the numerical Type I error rates are consistent with theoretical expectations and there is no bias in the parameter estimates regardless of the number of variables analyzed. We call this the direct symmetric approach. It appears that each model-implied boundary, whether explicit or implicit, increases the discrepancy between the numerical and theoretical Type I error rates by truncating the sampling distributions of the variance components and inducing bias in the parameters. The direct symmetric approach has several advantages over other multivariate twin models as it corrects the Type I error rate and parameter bias issues, is easy to implement in current software, and has fewer optimization problems. Implications for past and future research, and potential limitations associated with direct estimation of genetic and environmental covariance matrices are discussed.

Keywords: Cholesky decomposition; Correlated factors model; Direct symmetrical matrix; Twin models; Type I error.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest Brad Verhulst, Elizabeth Prom-Wormley, Matthew C Keller, Sarah Medland, and Michael C. Neale declare that they have no conflict of interest.

Figures

Figure 1:
Figure 1:
Alternative parameterizations of the univariate ACE model for a pair of twins for A) the standard path specification and b) the direct variance specification. Note: T1 and T2 represent the observed phenotypes for twin 1 and twin 2, respectively. The latent variables, depicted as circles representing the effects of additive genetic (A), common environment (C) and specific environment (E) variation generate phenotypic variation. Path labels in blue are estimated parameters and paths labels in red italics are fixed at the specified values. In Figure 1A, regression path coefficients a, c and e capture the relationship between the latent variable and the phenotypes, which are squared and summed to the phenotypic variance in the twin model. In Figure 1B, variance components VA, VC and VE are specified as variances which can be directly summed to the phenotypic variation. The phenotypic means are represented by μ.
Figure 2:
Figure 2:
Common Pathway model or Biometric Factor model for one twin Note: The CPM twin model is an extension of the common factor model. The latent factor, F1, is caused by additive genetic (A), common (C) and specific (E) environmental factors. The four individual phenotypes P1-P4 are each caused by these latent factors and by residual variance components, which are also partitioned into additive genetic, common and specific environment components (As1-As4, Cs1-Cs4 and Es1-Es4). Path labels in blue are estimated parameters and paths labels in red italics are fixed at the specified values. The variance components of the latent variable sum to 1 (the variance of the latent variable), while the residual variance components sum to the residual phenotypic variation. Only one twin is presented to simplify the schematic diagram.
Figure 3:
Figure 3:
Independent Pathway model or Psychometric Factor model for one twin. Note: The IPM twin model is an extension of a three-factor confirmatory factor analysis. The latent factors F1 is exclusively caused by additive genetic (A) factors; the latent factors F2 is exclusively caused by common (C) environmental factors; and the latent factors F3 is exclusively caused by specific (E) environmental factors. The association between F1, F2, and F3, and the phenotypes P1-P4 are a function of the additive genetic, common or specific environmental factors that contribute to F1, F2, and F3 and the respective factor loadings (e.g. VAxλa1) as well as the sum of the residual variance components. Path labels in blue are estimated parameters and paths labels in red italics are fixed at the specified values. Only one twin is presented to simplify the schematic diagram.
Figure 4:
Figure 4:
Density plots of the estimated common environmental and additive genetic variance components for the Cholesky, Correlated Factors, and Direct Symmetric estimation methods. Note: The density plots depict the common environmental and additive genetic variance components for the first variable in simulation study 2. The solid red lines indicate the observed mean of the distribution while the dotted blue lines indicated the simulated value for the parameter. If the solid red line is on the right of the dotted blue line, then the parameter is overestimated and if the solid red line is on the left of the dotted blue line the parameter is underestimated.
Figure 5:
Figure 5:
Pairwise comparisons between the negative 2 log-likelihoods of the Direct Symmetric, Cholesky and Correlated Factors methods of estimating variance components Note: the top panels show the full range of the −2 log-likelihoods for each pairwise estimation method. The bottom panels depict a zoomed in view of the scatterplot consistent with the red box in the panel above. The red line in the bottom panels indicates the equality of the −2 log-likelihood for each parameterization method. The data are taken from the 4-variable model from the first simulation study.

References

    1. Boker SM, Neale MC, Maes HH, Wilde MJ, Spiegel M, Brick TR, Estabrook R, Bates TC, Mehta P,von Oertzen T, Gore RJ, Hunter MD, Hackett DC, Karch J, Brandmaier A, Pritikin JM, Zahery M, Kirkpatrick RM, Wang Y, Driver C, Johnson SG, Kraft D, Wilhelm S, & Manjunath BG (2017) OpenMx 2.7.17–23 User Guide
    1. Bulik-Sullivan BK (2015) Relationship between LD Score and Haseman-Elston, bioRxiv doi 10.1101/018283. - DOI
    1. Coventry WL, & Keller MC (2005) Estimating the extent of parameter bias in the classical twin design: a comparison of parameter estimates from extended twin-family and classical twin designs. Twin Research and Human Genetics 8(3):214–23. - PubMed
    1. Dominicus A, Skrondal A, Gjessing HK, Pedersen NL, Palmgren J (2006) Likelihood ratio tests in behavioral genetics: problems and solutions. Behavior Genetics 36(2):331–340. - PubMed
    1. Falconer DS (1960) Introduction to Quantitative Genetics Oliver and Boyd, London.

Publication types

LinkOut - more resources