Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 8;5(1):vez003.
doi: 10.1093/ve/vez003. eCollection 2019 Jan.

SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination

Affiliations

SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination

Abbas Jariani et al. Virus Evol. .

Abstract

Simulations are widely used to provide expectations and predictive distributions under known conditions against which to compare empirical data. Such simulations are also invaluable for testing and comparing the behaviour and power of inference methods. We describe SANTA-SIM, a software package to simulate the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components: replication, recombination, point mutations, insertion-deletions, and selection under various fitness models and population size dynamics. The software is designed to be intuitive to work with for a wide range of users and executable in a cross-platform manner.

Keywords: fitness; mutation; recombination; selection; simulation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of simulation process in SANTA-SIM. A cycle of two generations in SANTA-SIM simulation, consisting of mutation, recombination, fitness evaluation, and selection. The circles on the left and right, respectively represent the individuals from the first population (parents) and the second generation (progenies). The size of the circles represents the fitness while the colour represents the genotype. Parents with higher fitness are more likely to be selected to generate a progeny, shown by the number of arrows. Each progeny could be generated from one parent (clonal replication) or two parents (recombinant replication).
Figure 2.
Figure 2.
The simulation has two phases. In the first 3,000 generations, the only selective force is purifying selection. After this initial phase (vertical grey line), four particular mutations become beneficial: 50 T (yellow), 100 K (light blue), 150 A (green), and 200 G (dark blue). Mutation 100 K has been present in the initial population at low prevalence (%). Prevalence on the y-axis is shown as log10 transformed. Diversity drops through each wave of selective sweep where a beneficial mutation appears and takes over. The simulation starts from a population with only one sequence at the first generation. Diversity was defined as the mean pairwise identity percentage between all sequences. For a given nucleotide position between two sequences, two non-identical bases will result in a score of one for that position while identical bases give a zero score. The distance of the two sequences was calculated as the mean of such identity scores across all nucleotide positions. In this simulation, the alleles reaching fixation have primarily appeared de novo or selected from standing variation (100 K), as no recombination events were simulated here.
Figure 3.
Figure 3.
Phylogenetic tree from the sampled sequences through multiple waves of selective sweep. From the simulation run of 10,000 generations, a sample of 500 sequences was collected at every 100th generation, and a tree was made with FastTree using default parameters (Price, Dehal, and Arkin, 2009). The tree is coloured by increasing generations (from red to blue) and the outer band denotes the consecutive selection of beneficial mutations (see Fig. 2 for mutation colours, the red section denotes absence of a mutation).
Figure 4.
Figure 4.
Diversity trajectory through selective sweeps for different initial frequencies of the beneficial alleles. Two modes of selective sweep are simulated for a population of size 10,000: In one case a small fraction of the initial population carries the beneficial allele (2 in 10,000; blue line), whereas in the other case a higher fraction carries this allele (55 in 10,000; red line). The diversity is calculated as the mean pairwise distance between 1,000 sampled sequences from the population, similar to the previous simulation. The simulation is repeated 1,000 times. The mean and standard deviation of these replicates are shown as the thick line and shaded area, respectively.
Figure 5.
Figure 5.
Phylogenetic trees after fixation of the beneficial allele grouped for two levels of initial prevalence of allele. A population with two levels of initial prevalence of a beneficial allele is subjected to selective sweeps. Phylogenetic unrooted trees are sampled (100 sequences), using tree sampling capability of SANTA-SIM, after the fixation of the beneficial allele. The tree on the left corresponds to the case where the starting population had a lower number of individuals with the beneficial allele (2 in 10,000), while the tree on the right corresponds to the case where the starting population had a higher number of individuals with the beneficial allele (55 in 10,000). Clades were collapsed (red dots) when average branch length distance to the taxa were below an illustrative threshold.
Figure 6.
Figure 6.
Simulation of selection dynamics in host pathogen co-evolution. Simulation of the interplay between the appearance of an escape mutation in a pathogen and host adaptation to the resistance. The selection coefficient of the beneficial mutation was set to 0.05. The exposure dependent fitness function was used to simulate the gradual decrease in fitness of the beneficial resistance allele in a pathogen as the hosts immune system adapts. Three parameters for exposure penalty were used. The penalty parameter for the green curves is set to 10−7, for the orange curves it is 10−6 and for the purple curves it is 10−5. There are two replicates shown for each simulation.
Figure 7.
Figure 7.
Performance benchmarking. Memory and run time of simulations with purifying selection for 10,000 generations with different population sizes and genome lengths.

References

    1. Balloux F. (2001) ‘EASYPOP (version 1.7): A Computer Program for Population Genetics Simulations’, The Journal of Heredity, 92: 301–2. - PubMed
    1. Carvajal-Rodriguez A. (2008) ‘GENOMEPOP: A Program to Simulate Genomes in Populations’, BMC Bioinformatics, 9: 223. - PMC - PubMed
    1. Gillespie J. H. (2001) ‘Is the Population Size of a Species Relevant to Its Evolution?’, Evolution; International Journal of Organic Evolution, 55: 2161–9. - PubMed
    1. Guillaume F., Rougemont J. (2006) ‘Nemo: An Evolutionary and Population Genetics Programming Framework’, Bioinformatics, 22: 2556–7. - PubMed
    1. Haller B. C., Messer P. W. (2017) ‘SLiM 2: Flexible, Interactive Forward Genetic Simulations’, Molecular Biology and Evolution, 34: 230–40. - PubMed