An overview of STRUCTURE: applications, parameter settings, and supporting software

Liliana Porras-Hurtado¹, Yarimar Ruiz, Carla Santos, Christopher Phillips, Angel Carracedo, Maria V Lareu

Affiliations

Affiliation

¹ Universidad Tecnológica de Pereira Pereira, Colombia ; Forensic Genetics Unit, Institute of Legal Medicine, University of Santiago de Compostela Santiago de Compostela, Spain.

PMID: 23755071
PMCID: PMC3665925
DOI: 10.3389/fgene.2013.00098

An overview of STRUCTURE: applications, parameter settings, and supporting software

Liliana Porras-Hurtado et al. Front Genet. 2013.

. 2013 May 29:4:98.

doi: 10.3389/fgene.2013.00098. eCollection 2013.

Authors

Liliana Porras-Hurtado¹, Yarimar Ruiz, Carla Santos, Christopher Phillips, Angel Carracedo, Maria V Lareu

Affiliation

¹ Universidad Tecnológica de Pereira Pereira, Colombia ; Forensic Genetics Unit, Institute of Legal Medicine, University of Santiago de Compostela Santiago de Compostela, Spain.

PMID: 23755071
PMCID: PMC3665925
DOI: 10.3389/fgene.2013.00098

Abstract

Objectives: We present an up-to-date review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the whole sample by detecting allele frequency differences within the data and can assign individuals to those sub-populations based on analysis of likelihoods. The review covers STRUCTURE's most commonly used ancestry and frequency models, plus an overview of the main applications of the software in human genetics including case-control association studies (CCAS), population genetics, and forensic analysis. The review is accompanied by supplementary material providing a step-by-step guide to running STRUCTURE.

Methods: With reference to a worked example, we explore the effects of changing the principal analysis parameters on STRUCTURE results when analyzing a uniform set of human genetic data. Use of the supporting software: CLUMPP and distruct is detailed and we provide an overview and worked example of STRAT software, applicable to CCAS.

Conclusion: The guide offers a simplified view of how STRUCTURE, CLUMPP, distruct, and STRAT can be applied to provide researchers with an informed choice of parameter settings and supporting software when analyzing their own genetic data.

Keywords: CLUMPP; STRAT; STRUCTURE; case-control association studies; distruct; population structure; stratification.

PubMed Disclaimer

Figures

**Figure 1**
*STRUCTURE* bar plots representing K = 4 for the principal analysis parameter combinations available to the user. These graphics were obtained with *distruct* and using *CLUMPP* to align the three replicates for K = 4 (all runs were performed with 100,000 burnin period and 100,000 MCMC repeats after burnin). The exception was the POPINFO parameter sets for which direct *STRUCTURE* bar plot outputs were used. Human genetic data comprised genotypes listed in Table S1 consisting of 100 Africans: CEPH AFR, 158 Europeans: CEPH EUR, 165 East Asians: CEPH EAS, and 64 Native Americans: CEPH NAM from the HGDP-CEPH human diversity panel. An artificial case-control group was created using HapMap Mexican and Puerto Rican samples giving a total 67 sample divided into Cases 1 (C1), Cases 2 (C2), and Controls (Ct). Markers were: 9 AIM-SNPs (two triallelic), 3 phenotype associated SNPs and 5 AIM-SNPs on the X-chromosome. The phenotype and the X-SNPs are linked forming two distinct linkage disequilibrium groups—their genetic distance was used to define linkage disequilibrium groups. Each parameter setting and the results obtained are described in detail in Supplementary Material 1.

**Figure 2**
**Example case-control sample analyses comparing scenarios with the presence or absence of stratification**. *STRUCTURE* bar plots and *STRAT* table results are shown. **(A)** Case 1 (C1) are compared to the Control (Ct) samples. **(B)** Case 2 (C2) are compared to the Control (Ct) samples. Details of these analyses are described in Supplementary Material 1.

See this image and copyright information in PMC

References

1. Abdulla M. A., Ahmed I., Assawamakin A., Bhak J., Brahmachari S. K., Calacal G. C., et al. (2009). Mapping human genetic diversity in Asia. Science 326, 1541–1545 10.1126/science.1177074 - DOI - PubMed
1. Alexander D. H., Lange K. (2011). Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics 12:246 10.1186/1471-2105-12-246 - DOI - PMC - PubMed
1. Alexander D. H., Novembre J., Lange K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 10.1101/gr.094052.109 - DOI - PMC - PubMed
1. Allocco D. J., Song Q., Gibbons G. H., Ramoni M. F., Kohane I. S. (2007). Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms. BMC Genomics 8:68 10.1186/1471-2164-8-68 - DOI - PMC - PubMed
1. Amigo J., Salas A., Phillips C., Carracedo A. (2008). SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access. BMC Bioinformatics 9:428 10.1186/1471-2105-9-428 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An overview of STRUCTURE: applications, parameter settings, and supporting software

Affiliation

An overview of STRUCTURE: applications, parameter settings, and supporting software

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources