Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 3;39(5):msac092.
doi: 10.1093/molbev/msac092.

AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era

Affiliations

AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era

Nhan Ly-Trong et al. Mol Biol Evol. .

Abstract

Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.

Keywords: molecular evolution; phylogenetics; sequence simulation.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Sequence simulation process with two scenarios: (A) Simulating an MSA from a phylogenetic tree and a Markov substitution model, and (B) Simulating an MSA that mimics the underlying evolutionary process of a user-provided MSA. Here, the phylogenetic tree and the substitution model parameters are internally inferred from the user-provided MSA, which are used to simulate a new MSA.
Fig. 2.
Fig. 2.
Runtimes and peak memory consumptions of five software AliSim, Seq-Gen, Dawg, INDELible, and phastSim for deep-data (varying number of sequences and 30K sites; sub-panels A and B) simulations without indels, long-data (varying number of sites and 30K sequences; sub-panels C and D) simulations without indels, and varied #sequences (varying number of sequences and setting root sequence length at 30K sites; sub-panels E and F) simulations with indels.

Similar articles

Cited by

References

    1. Abadi S, Avram O, Rosset S, Pupko T, Mayrose I. 2020. ModelTeller: model selection for optimal phylogenetic reconstruction using machine learning. Mol Biol Evol. 37(11):3338–3352. - PubMed
    1. Adell JC, Dopazo J. 1994. Monte Carlo simulation in phylogenies: an application to test the constancy of evolutionary rates. J Mol Evol. 38(3):305–309. - PubMed
    1. Beaumont MA, Zhang W, Balding DJ. 2002. Approximate Bayesian computation in population genetics. Genetics 162(4):2025–2035. - PMC - PubMed
    1. Benner SA, Cohen MA, Gonnet GH. 1993. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol. 229(4):1065–1082. - PubMed
    1. Cartwright RA. 2005. DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21(Suppl. 3):31–38. - PubMed

Publication types