GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Brad Verhulst¹, Hermine H Maes², Michael C Neale²

Affiliations

¹ Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA. brad.verhulst@gmail.com.
² Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA.

PMID: 28299468
PMCID: PMC5423544
DOI: 10.1007/s10519-017-9842-6

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Brad Verhulst et al. Behav Genet. 2017 May.

. 2017 May;47(3):345-359.

doi: 10.1007/s10519-017-9842-6. Epub 2017 Mar 15.

Authors

Brad Verhulst¹, Hermine H Maes², Michael C Neale²

Affiliations

¹ Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA. brad.verhulst@gmail.com.
² Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA.

PMID: 28299468
PMCID: PMC5423544
DOI: 10.1007/s10519-017-9842-6

Abstract

Improving the accuracy of phenotyping through the use of advanced psychometric tools will increase the power to find significant associations with genetic variants and expand the range of possible hypotheses that can be tested on a genome-wide scale. Multivariate methods, such as structural equation modeling (SEM), are valuable in the phenotypic analysis of psychiatric and substance use phenotypes, but these methods have not been integrated into standard genome-wide association analyses because fitting a SEM at each single nucleotide polymorphism (SNP) along the genome was hitherto considered to be too computationally demanding. By developing a method that can efficiently fit SEMs, it is possible to expand the set of models that can be tested. This is particularly necessary in psychiatric and behavioral genetics, where the statistical methods are often handicapped by phenotypes with large components of stochastic variance. Due to the enormous amount of data that genome-wide scans produce, the statistical methods used to analyze the data are relatively elementary and do not directly correspond with the rich theoretical development, and lack the potential to test more complex hypotheses about the measurement of, and interaction between, comorbid traits. In this paper, we present a method to test the association of a SNP with multiple phenotypes or a latent construct on a genome-wide basis using a diagonally weighted least squares (DWLS) estimator for four common SEMs: a one-factor model, a one-factor residuals model, a two-factor model, and a latent growth model. We demonstrate that the DWLS parameters and p-values strongly correspond with the more traditional full information maximum likelihood parameters and p-values. We also present the timing of simulations and power analyses and a comparison with and existing multivariate GWAS software package.

Keywords: DWLS; Diagonally weighted least squares; GWAS; Genetics; Genome-wide association study; SEM; Structural equation modeling.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: Brad Verhulst declares that he has no conflict of interest. Hermine H. Maes declares that she has no conflict of interest. Michael C. Neale declares that he has no conflict of interest.

Figures

**Figure 1. Schematic Representations of the Structural Equation Models that can be fit using the GW-SEM package**
Fig. 1a presents the one-factor model, in which a latent factor (F₁) causes the observed items (*x_k*). The association between the latent factor and the observed indicators are estimated by the factor loadings (*λ_k*). The residual variances (*δ_k*) indicate the variance in *x_k* that is not shared with the latent factor. The regression of the latent factor on the SNP (for all SNPs in the analysis) is depicted by *β_F*. Fig. 1b presents the residuals model, which has very similar parameters to the one-factor model, with the notable difference that the individual items are regressed on each SNP (*γ_k*). Fig. 1c presents the two-factor model. In this model, both latent factors (F₁ & F₂) are regressed on every SNP (*β_F*₁ & *β_F*₂) and the latent factors are allowed to correlate (ψ). Finally, Fig. 1d presents the latent growth model, where the factor loadings are fixed to specified values, and the means (*μ_F*), variances and covariances (Ψ) of the latent growth parameters are estimated. Each latent growth factor is then regressed on each SNP (*β_F*).

**Figure 2. The average duration (in minutes) to estimate covariances between the SNPs, items, and covariates (error bars represent ± 1.96 standard deviations)**
Fig. 2a presents the mean number of minutes (and standard deviations) to estimate covariances between 1,000 SNPs and 5 items and 3 covariates for 2,500, 5,000 and 10,000 observations for the one-factor model. Fig. 2b presents the mean number of minutes (and standard deviations) to estimate covariances between 1,000 SNPs and 3 covariates and 3, 4 and 5, items for 2,500 observations for the one-factor model.

**Figure 3. Power to detect a genome-wide significant association with varying effect sizes and minor allele frequencies**
Fig. 3a–d present the power curves for the ability to detect genome-wide significant associations between a SNP and a latent factor for a one-factor model with 5 items for continuous and ordinal items and SNPs with a minor allele frequency of .25 or .05.

See this image and copyright information in PMC

References

1. Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlinrapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002 Jan;30(1):97–101. - PubMed
1. Agresti A. Categorical data analysis. second. Wiley-Interscience; 2002.
1. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981;46(4):443459.
1. Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Fox J. Openmx: An open source extended structural equation modeling framework. Psychometrika. 2011;76(2):306–11. - PMC - PubMed
1. Boker SM, Neale MC, Maes HH, Wilde MJ, Spiegel M, Brick TR, Driver C. Openmx 2.3.1 user guide [Computer software manual] 2015.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Affiliations

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases