Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 8;9(10):R147.
doi: 10.1186/gb-2008-9-10-r147.

Tools for simulating evolution of aligned genomic regions with integrated parameter estimation

Affiliations

Tools for simulating evolution of aligned genomic regions with integrated parameter estimation

Avinash Varadarajan et al. Genome Biol. .

Abstract

Controlled simulations of genome evolution are useful for benchmarking tools. However, many simulators lack extensibility and cannot measure parameters directly from data. These issues are addressed by three new open-source programs: GSIMULATOR (for neutrally evolving DNA), SIMGRAM (for generic structured features) and SIMGENOME (for syntenic genome blocks). Each offers algorithms for parameter measurement and reconstruction of ancestral sequence. All three tools out-perform the leading neutral DNA simulator (DAWG) in benchmarks. The programs are available at http://biowiki.org/SimulationTools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Receiver operating characteristic (ROC) curves for two non-coding RNA prediction algorithms, ClosingBp (Bradley RK, Uzilov AV, Skinner M, Bendaña YR, Barquist L and Holmes I, submitted) and EVOFOLD [39] (implemented using XRATE), using GSIMULATOR and SIMGENOME models to estimate the false positive discovery rate. These curves illustrate the general principle that the more realistic a simulation model, the higher the estimated false positive rate (FPR). This trend is independent of the gene-prediction algorithm used. The upper panes show results for GSIMULATOR: it is seen that more complex indel length distributions (N) and, in particular, context-dependence (K) both increase the FPR. The lower panes show results for SIMGENOME and component models, where the FPR is increased by including gaps (which amplify fluctuations in information content, due to their typically being treated as 'missing information') and genomic features (some of which evolve at a slower rate than neutral sequence). The reason that the asymptotic sensitivity is less than 1.0 is that our benchmark used a sliding-window approach, predicting at most one non-coding RNA (ncRNA) in each window. Our set of real ncRNAs was taken from multi-genome Drosophila alignments produced by the PECAN program [50]; in each case, to ensure a fair comparison, we took a window of the PECAN alignment surrounding the annotated ncRNA, with the size of this window matching the size of the sliding-window that was used on the simulated null data. Some of the positive ncRNAs in these PECAN-aligned windows score so poorly under the gene prediction model - for example, due to inaccuracies in the PECAN alignment of that window - that the predicted ncRNA is consistently placed in the wrong location within the window. These real ncRNAs are, therefore, never detected, no matter how low the scoring threshold, setting an upper limit on the achievable sensitivity.
Figure 2
Figure 2
ROC curves for two non-coding RNA predictors, ClosingBp (Bradley RK, Uzilov AV, Skinner M, Bendaña YR, Barquist L and Holmes I, submitted) and EVOFOLD [39] (implemented using XRATE), comparing DAWG [10] to the richest GSIMULATOR and SIMGENOME models. The three curves for each gene predictor clearly illustrate that increased model richness (DAWG → GSIMULATOR → SIMGENOME) yields higher estimated FPR. See the caption to Figure 1 for an explanation of why the asymptotic sensitivity is less than 1.0.

Similar articles

Cited by

References

    1. Pedersen JS, Hein J. Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics. 2003;19:219–227. - PubMed
    1. Bais AS, Grossmann S, Vingron M. Incorporating evolution of transcription factor binding sites into annotated alignments. J Biosci. 2007;32:841–850. - PubMed
    1. Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics. 2004;5:6. - PMC - PubMed
    1. Evans J, Sheneman L, Foster J. Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J Mol Evol. 2006;62:785–792. - PubMed
    1. Rasmussen MD, Kellis M. Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res. 2007;17:1932–1942. - PMC - PubMed

Publication types

LinkOut - more resources