Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug;26(8):1879-88.
doi: 10.1093/molbev/msp098. Epub 2009 May 7.

INDELible: a flexible simulator of biological sequence evolution

Affiliations

INDELible: a flexible simulator of biological sequence evolution

William Fletcher et al. Mol Biol Evol. 2009 Aug.

Abstract

Many methods exist for reconstructing phylogenies from molecular sequence data, but few phylogenies are known and can be used to check their efficacy. Simulation remains the most important approach to testing the accuracy and robustness of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions. We implement a portable and flexible application, named INDELible, for generating nucleotide, amino acid and codon sequence data by simulating insertions and deletions (indels) as well as substitutions. Indels are simulated under several models of indel-length distribution. The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
The Lavalette distribution of indel length plotted for different values of the maximum indel length M, with a = 0.5 fixed (see eq. 5). Note that u can take integer values 1, 2, … , M only.
F<sc>IG</sc>. 2.—
FIG. 2.—
Speed comparison between DAWG and INDELible, with and without continuous gamma rate heterogeneity. The basic simulation model is specified by the settings in figure 3, whereas one factor is varied to see its impact in each plot. INDELible1 and INDELible2 refer to INDELible simulation under methods 1 and 2, respectively. The tests were carried out on a SunFire Opteron X4600M2 server running Linux.
F<sc>IG</sc>. 3.—
FIG. 3.—
An example input file for INDELible. The substitution model has been set to HKY + Γ with a transition–transversion rate ratio of κ = 2, stationary base frequencies of 0.4 (T), 0.3 (C), 0.2 (A), and 0.1 (G), and continuous gamma rate variation with shape parameter α = 1. Insertions and deletions have both been set to have an instantaneous rate of 0.1 (relative to an average substitution rate of 1) and the same geometric length distribution with a mean length of 4. Then, the phylogeny with branch lengths is specified. In the simulations for the speed tests, a 32-taxa, symmetric, strictly bifurcating tree with all branch lengths equal to 0.1 is used instead. This simulation creates 100 replicate data sets each containing one partition with a randomly created root sequence of 1,000 bases.

References

    1. Abascal F, Posada D, Zardoya R. MtArt: a new Model of amino acid replacement for Arthropoda. Mol Biol Evol. 2007;24:1–5. - PubMed
    1. Adachi J, Hasegawa M. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput Sci Monogr. 1996;28:1–150.
    1. Adachi J, Waddell PJ, Martin W, Hasegawa M. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol. 2000;50:348–358. - PubMed
    1. Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed
    1. Arndt PF, Hwa T. Regional and time-resolved mutation patterns of the human genome. Bioinformatics. 2004;20:1482–1485. - PubMed

Publication types