Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun;27(2):e1608.
doi: 10.1002/mpr.1608. Epub 2018 Feb 27.

A tutorial on conducting genome-wide association studies: Quality control and statistical analysis

Affiliations

A tutorial on conducting genome-wide association studies: Quality control and statistical analysis

Andries T Marees et al. Int J Methods Psychiatr Res. 2018 Jun.

Abstract

Objectives: Genome-wide association studies (GWAS) have become increasingly popular to identify associations between single nucleotide polymorphisms (SNPs) and phenotypic traits. The GWAS method is commonly applied within the social sciences. However, statistical analyses will need to be carefully conducted and the use of dedicated genetics software will be required. This tutorial aims to provide a guideline for conducting genetic analyses.

Methods: We discuss and explain key concepts and illustrate how to conduct GWAS using example scripts provided through GitHub (https://github.com/MareesAT/GWA_tutorial/). In addition to the illustration of standard GWAS, we will also show how to apply polygenic risk score (PRS) analysis. PRS does not aim to identify individual SNPs but aggregates information from SNPs across the genome in order to provide individual-level scores of genetic risk.

Results: The simulated data and scripts that will be illustrated in the current tutorial provide hands-on practice with genetic analyses. The scripts are based on PLINK, PRSice, and R, which are commonly used, freely available software tools that are accessible for novice users.

Conclusions: By providing theoretical background and hands-on experience, we aim to make GWAS more accessible to researchers without formal training in the field.

Keywords: GitHub; PLINK; genome-wide association study (GWAS); polygenic risk score (PRS); tutorial.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of various commonly used PLINK files. SNP = single nucleotide polymorphism
Figure 2
Figure 2
Structure of the PLINK command line. *Not all shells will show this. **Provide the path to the directory where PLINK is installed if this is not in the current directory (e.g., /usr/local/bin/plink). Note that this example command was generated using PuTTY, a free SSH and Telnet client. When using other resources, there might be small graphical variations; however, the basic structure of a PLINK command will be identical
Figure 3
Figure 3
Multidimensional scaling (MDS) plot of 1KG against the CEU of the HapMap data (which could be seen as your “own” data in this example, as it is being used in the online tutorial at https://github.com/MareesAT/GWA_tutorial/). The black crosses (+ = “OWN”) in the upper left part represent the first two MDS components of the individuals in the HapMap sample (the colored symbols represent the 1KG data (formula image = European; formula image = African; formula image = Ad Mixed American; formula image = Asian). The MDS components representing the European samples (formula image) are located in the upper left, the African samples (formula image) are located in the upper right, the Ad Mixed American samples (formula image) are located near the intersection point of the dashed lines, the Asian components (formula image) are located in the lower left part
Figure 4
Figure 4
Working example of three single nucleotide polymorphisms (SNPs) aggregated into a single individual polygenic risk score (PRS). *The weight is either the beta or the log of the odds‐ratio, depending on whether a continuous or binary trait is analysed

References

    1. Abdellaoui, A. , Hottenga, J. J. , Xiao, X. J. , Scheet, P. , Ehli, E. A. , Davies, G. E. , … Boomsma, D. I. (2013). Association between autozygosity and major depression: Stratification due to religious assortment. Behavior Genetics, 43(6), 455–467. 10.1007/s10519-013-9610-1 - DOI - PMC - PubMed
    1. Altshuler, D. M. , Durbin, R. M. , Abecasis, G. R. , Bentley, D. R. , Chakravarti, A. , Clark, A. G. , … Consortium, G. P. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 56–65. 10.1038/nature11632 - DOI - PMC - PubMed
    1. Anderson, C. A. , Pettersson, F. H. , Clarke, G. M. , Cardon, L. R. , Morris, A. P. , & Zondervan, K. T. (2010). Data quality control in genetic case‐control association studies. Nature Protocols, 5(9), 1564–1573. 10.1038/nprot.2010.116 - DOI - PMC - PubMed
    1. Ardlie, K. G. , DeLuca, D. S. , Segre, A. V. , Sullivan, T. J. , Young, T. R. , Gelfand, E. T. , … Consortium, G. (2015). The genotype‐tissue expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235), 648–660. 10.1126/science.1262110 - DOI - PMC - PubMed
    1. Aulchenko, Y. S. , Ripke, S. , Isaacs, A. , & Van Duijn, C. M. (2007). Gen ABEL: An R library for genome‐wide association analysis. Bioinformatics, 23(10), 1294–1296. 10.1093/bioinformatics/btm108 - DOI - PubMed

Publication types

MeSH terms