Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 29:4:98.
doi: 10.3389/fgene.2013.00098. eCollection 2013.

An overview of STRUCTURE: applications, parameter settings, and supporting software

Affiliations

An overview of STRUCTURE: applications, parameter settings, and supporting software

Liliana Porras-Hurtado et al. Front Genet. .

Abstract

Objectives: We present an up-to-date review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the whole sample by detecting allele frequency differences within the data and can assign individuals to those sub-populations based on analysis of likelihoods. The review covers STRUCTURE's most commonly used ancestry and frequency models, plus an overview of the main applications of the software in human genetics including case-control association studies (CCAS), population genetics, and forensic analysis. The review is accompanied by supplementary material providing a step-by-step guide to running STRUCTURE.

Methods: With reference to a worked example, we explore the effects of changing the principal analysis parameters on STRUCTURE results when analyzing a uniform set of human genetic data. Use of the supporting software: CLUMPP and distruct is detailed and we provide an overview and worked example of STRAT software, applicable to CCAS.

Conclusion: The guide offers a simplified view of how STRUCTURE, CLUMPP, distruct, and STRAT can be applied to provide researchers with an informed choice of parameter settings and supporting software when analyzing their own genetic data.

Keywords: CLUMPP; STRAT; STRUCTURE; case-control association studies; distruct; population structure; stratification.

PubMed Disclaimer

Figures

Figure 1
Figure 1
STRUCTURE bar plots representing K = 4 for the principal analysis parameter combinations available to the user. These graphics were obtained with distruct and using CLUMPP to align the three replicates for K = 4 (all runs were performed with 100,000 burnin period and 100,000 MCMC repeats after burnin). The exception was the POPINFO parameter sets for which direct STRUCTURE bar plot outputs were used. Human genetic data comprised genotypes listed in Table S1 consisting of 100 Africans: CEPH AFR, 158 Europeans: CEPH EUR, 165 East Asians: CEPH EAS, and 64 Native Americans: CEPH NAM from the HGDP-CEPH human diversity panel. An artificial case-control group was created using HapMap Mexican and Puerto Rican samples giving a total 67 sample divided into Cases 1 (C1), Cases 2 (C2), and Controls (Ct). Markers were: 9 AIM-SNPs (two triallelic), 3 phenotype associated SNPs and 5 AIM-SNPs on the X-chromosome. The phenotype and the X-SNPs are linked forming two distinct linkage disequilibrium groups—their genetic distance was used to define linkage disequilibrium groups. Each parameter setting and the results obtained are described in detail in Supplementary Material 1.
Figure 2
Figure 2
Example case-control sample analyses comparing scenarios with the presence or absence of stratification. STRUCTURE bar plots and STRAT table results are shown. (A) Case 1 (C1) are compared to the Control (Ct) samples. (B) Case 2 (C2) are compared to the Control (Ct) samples. Details of these analyses are described in Supplementary Material 1.

Similar articles

Cited by

References

    1. Abdulla M. A., Ahmed I., Assawamakin A., Bhak J., Brahmachari S. K., Calacal G. C., et al. (2009). Mapping human genetic diversity in Asia. Science 326, 1541–1545 10.1126/science.1177074 - DOI - PubMed
    1. Alexander D. H., Lange K. (2011). Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246 10.1186/1471-2105-12-246 - DOI - PMC - PubMed
    1. Alexander D. H., Novembre J., Lange K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 10.1101/gr.094052.109 - DOI - PMC - PubMed
    1. Allocco D. J., Song Q., Gibbons G. H., Ramoni M. F., Kohane I. S. (2007). Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms. BMC Genomics 8:68 10.1186/1471-2164-8-68 - DOI - PMC - PubMed
    1. Amigo J., Salas A., Phillips C., Carracedo A. (2008). SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access. BMC Bioinformatics 9:428 10.1186/1471-2105-9-428 - DOI - PMC - PubMed