Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;47(3):345-359.
doi: 10.1007/s10519-017-9842-6. Epub 2017 Mar 15.

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Affiliations

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Brad Verhulst et al. Behav Genet. 2017 May.

Abstract

Improving the accuracy of phenotyping through the use of advanced psychometric tools will increase the power to find significant associations with genetic variants and expand the range of possible hypotheses that can be tested on a genome-wide scale. Multivariate methods, such as structural equation modeling (SEM), are valuable in the phenotypic analysis of psychiatric and substance use phenotypes, but these methods have not been integrated into standard genome-wide association analyses because fitting a SEM at each single nucleotide polymorphism (SNP) along the genome was hitherto considered to be too computationally demanding. By developing a method that can efficiently fit SEMs, it is possible to expand the set of models that can be tested. This is particularly necessary in psychiatric and behavioral genetics, where the statistical methods are often handicapped by phenotypes with large components of stochastic variance. Due to the enormous amount of data that genome-wide scans produce, the statistical methods used to analyze the data are relatively elementary and do not directly correspond with the rich theoretical development, and lack the potential to test more complex hypotheses about the measurement of, and interaction between, comorbid traits. In this paper, we present a method to test the association of a SNP with multiple phenotypes or a latent construct on a genome-wide basis using a diagonally weighted least squares (DWLS) estimator for four common SEMs: a one-factor model, a one-factor residuals model, a two-factor model, and a latent growth model. We demonstrate that the DWLS parameters and p-values strongly correspond with the more traditional full information maximum likelihood parameters and p-values. We also present the timing of simulations and power analyses and a comparison with and existing multivariate GWAS software package.

Keywords: DWLS; Diagonally weighted least squares; GWAS; Genetics; Genome-wide association study; SEM; Structural equation modeling.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: Brad Verhulst declares that he has no conflict of interest. Hermine H. Maes declares that she has no conflict of interest. Michael C. Neale declares that he has no conflict of interest.

Figures

Figure 1
Figure 1. Schematic Representations of the Structural Equation Models that can be fit using the GW-SEM package
Fig. 1a presents the one-factor model, in which a latent factor (F1) causes the observed items (xk). The association between the latent factor and the observed indicators are estimated by the factor loadings (λk). The residual variances (δk) indicate the variance in xk that is not shared with the latent factor. The regression of the latent factor on the SNP (for all SNPs in the analysis) is depicted by βF. Fig. 1b presents the residuals model, which has very similar parameters to the one-factor model, with the notable difference that the individual items are regressed on each SNP (γk). Fig. 1c presents the two-factor model. In this model, both latent factors (F1 & F2) are regressed on every SNP (βF1 & βF2) and the latent factors are allowed to correlate (ψ). Finally, Fig. 1d presents the latent growth model, where the factor loadings are fixed to specified values, and the means (μF), variances and covariances (Ψ) of the latent growth parameters are estimated. Each latent growth factor is then regressed on each SNP (βF).
Figure 2
Figure 2. The average duration (in minutes) to estimate covariances between the SNPs, items, and covariates (error bars represent ± 1.96 standard deviations)
Fig. 2a presents the mean number of minutes (and standard deviations) to estimate covariances between 1,000 SNPs and 5 items and 3 covariates for 2,500, 5,000 and 10,000 observations for the one-factor model. Fig. 2b presents the mean number of minutes (and standard deviations) to estimate covariances between 1,000 SNPs and 3 covariates and 3, 4 and 5, items for 2,500 observations for the one-factor model.
Figure 3
Figure 3. Power to detect a genome-wide significant association with varying effect sizes and minor allele frequencies
Fig. 3a–d present the power curves for the ability to detect genome-wide significant associations between a SNP and a latent factor for a one-factor model with 5 items for continuous and ordinal items and SNPs with a minor allele frequency of .25 or .05.

Similar articles

Cited by

References

    1. Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlinrapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002 Jan;30(1):97–101. - PubMed
    1. Agresti A. Categorical data analysis. second. Wiley-Interscience; 2002.
    1. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981;46(4):443459.
    1. Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Fox J. Openmx: An open source extended structural equation modeling framework. Psychometrika. 2011;76(2):306–11. - PMC - PubMed
    1. Boker SM, Neale MC, Maes HH, Wilde MJ, Spiegel M, Brick TR, Driver C. Openmx 2.3.1 user guide [Computer software manual] 2015.

Publication types