Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 18;22(1):506.
doi: 10.1186/s12859-021-04415-x.

AdmixSim 2: a forward-time simulator for modeling complex population admixture

Affiliations

AdmixSim 2: a forward-time simulator for modeling complex population admixture

Rui Zhang et al. BMC Bioinformatics. .

Abstract

Background: Computer simulations have been widely applied in population genetics and evolutionary studies. A great deal of effort has been made over the past two decades in developing simulation tools. However, there are not many simulation tools suitable for studying population admixture.

Results: We here developed a forward-time simulator, AdmixSim 2, an individual-based tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarios. Unlike its previous version, AdmixSim 2 is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation, which allow users to freely design and simulate any complex scenario involving population admixture. AdmixSim 2 can be used to simulate data of dioecious or monoecious populations, autosomes, or sex chromosomes. To our best knowledge, there are no similar tools available for the purpose of simulation of complex population admixture. Using empirical or previously simulated genomic data as input, AdmixSim 2 provides phased haplotype data for the convenience of further admixture-related analyses such as local ancestry inference, association studies, and other applications. We here evaluate the performance of AdmixSim 2 based on simulated data and validated functions via comparative analysis of simulated data and empirical data of African American, Mexican, and Uyghur populations.

Conclusions: AdmixSim 2 is a flexible simulation tool expected to facilitate the study of complex population admixture in various situations.

Keywords: Admixture models; Evolutionary forces; Forward-time simulation; Multiple-wave admixture; Population admixture.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
General simulation flow chart of AdmixSim 2. Colors indicate different ancestral populations, squares represent males, circles represent females. Red spots represent mutations and red crosses represent recombinations. AdmixSim2 requires four input files separately recording ancestral haplotype data, individual information, SNV information, and demographic model. During the simulation, each individual in the current generation undergoes mutation and is then sampled to be a parent of offspring in the next generation. The probability of being included in this sample is proportional to the individual’s fitness. Each parent contributes one gamete to the offspring after recombination. At the end of the simulation, there are six output files. The first three take nearly the same format as the corresponding input, and they can be used for subsequent simulations with a new demographic model
Fig. 2
Fig. 2
Simulation of African American admixture pattern. A PCA results. The left is the result of empirical data and the right is the result of simulated data. The patterns of these two results are quite similar. B Segment length proportion. The proportion of each ancestry was calculated based on the sum of the corresponding segment length. The average proportions of European and African were 0.225 and 0.775, which is in broad agreement with proportions set in the admixture model (European: 0.246, African: 0.754). C Supervised admixture analysis results at K = 2. The left is the result of empirical data and the right is simulated data. There is no marked difference between these two results. D Mutation number counts. The green histogram represents the simulation value and the red curve is derived from the theoretical Poisson distribution. The p-value was calculated using the chi-square goodness of fit test. The chromosome length was approximate 2.49 Morgan and the mutation rate was set as 10–8 per generation per site. Thus, after simulating 11 generations, the average mutation number of each haplotype is about 27
Fig. 3
Fig. 3
Recombination and length distribution of segments of distinct ancestry in the analysis of X chromosome and autosomal data. A Recombination breakpoint counts. The histogram represents the simulated value of recombination breakpoints and the red curve is the theoretical Poisson distribution. Under identical simulation models and parameter settings, the average number of recombination breakpoints of the X chromosome is approximately two-thirds of that of autosomes. B Segment length distribution. The histogram represents the simulated segment length distribution and the red curve is the theoretical exponential distribution. The theoretical distribution fits quite well with the simulated one. Moreover, X chromosomes possess a smaller effective recombination rate compared to autosomes
Fig. 4
Fig. 4
Selection Test of AdmixSim 2. A The variation trend of allele frequency. The grey curve represents each repeat. The red curve is the average of 500 repeats and the blue one is the theoretical value. The p-value was calculated using the Kolmogorov–Smirnov test and the average of 500 simulations was quite indistinguishable from the theoretical one. The pie chart on the right shows the statistical analysis of each repeat and the theoretical value. More than 95% of tests did not show statistical significance. B Allele fixation time under different combinations of initial frequency and selection coefficient. The fixation time reasonably decreased with the increase of selection coefficients. Moreover, the higher initial frequencies were, the fewer generations were cost to reach a fixation state
Fig. 5
Fig. 5
Performance evaluation of AdmixSim 2. A Varying chromosome length (centiMorgan). B Varying recombination rate (Morgan per base pair). C Varying mutation rate (per generation per site). D Varying population size. E Varying generation. F A varying number of loci under selection. The time cost increased linearly with the increase of the corresponding factor and was relatively low
Fig. 6
Fig. 6
Performance comparison of AdmixSim 2 and SLiM 3.3. A Time cost with different simulation generations. B Memory cost with different simulation generations. C Time cost with different population sizes. D Memory cost with different population sizes. Here the memory cost is the maximum resident set size during the simulation. Both simulators demonstrate a linear relationship with the generations or population size. The runtime and memory cost of AdmixSim 2 is much less than SLiM 3.3

References

    1. Carvajal-Rodriguez A. Simulation of genomes: a review. Curr Genom. 2008;9:155–159. doi: 10.2174/138920208784340759. - DOI - PMC - PubMed
    1. Carvajal-Rodriguez A. Simulation of genes and genomes forward in time. Curr Genom. 2011;11:58–61. doi: 10.2174/138920210790218007. - DOI - PMC - PubMed
    1. Youfang Liu GAaMEW A survey of genetic simulation software for population and epidemiological studies. Hum Genom. 2008;3:79–86. doi: 10.1186/1479-7364-3-1-79. - DOI - PMC - PubMed
    1. Hoban S. An overview of the utility of population simulation software in molecular ecology. Mol Ecol. 2014;23(10):2383–2401. doi: 10.1111/mec.12741. - DOI - PubMed
    1. Hoban S, Bertorelle G, Gaggiotti OE. Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet. 2012;13(2):110–122. doi: 10.1038/nrg3130. - DOI - PubMed