Fast "coalescent" simulation

Paul Marjoram¹, Jeff D Wall

Affiliations

PMID: 16539698
PMCID: PMC1458357
DOI: 10.1186/1471-2156-7-16

Fast "coalescent" simulation

Paul Marjoram et al. BMC Genet. 2006.

. 2006 Mar 15:7:16.

doi: 10.1186/1471-2156-7-16.

Authors

Paul Marjoram¹, Jeff D Wall

Affiliation

¹ Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9011, USA. pmarjora@usc.edu

PMID: 16539698
PMCID: PMC1458357
DOI: 10.1186/1471-2156-7-16

Abstract

Background: The amount of genome-wide molecular data is increasing rapidly, as is interest in developing methods appropriate for such data. There is a consequent increasing need for methods that are able to efficiently simulate such data. In this paper we implement the sequentially Markovian coalescent algorithm described by McVean and Cardin and present a further modification to that algorithm which slightly improves the closeness of the approximation to the full coalescent model. The algorithm ignores a class of recombination events known to affect the behavior of the genealogy of the sample, but which do not appear to affect the behavior of generated samples to any substantial degree.

Results: We show that our software is able to simulate large chromosomal regions, such as those appropriate in a consideration of genome-wide data, in a way that is several orders of magnitude faster than existing coalescent algorithms.

Conclusion: This algorithm provides a useful resource for those needing to simulate large quantities of data for chromosomal-length regions using an approach that is much more efficient than traditional coalescent models.

PubMed Disclaimer

Figures

**Figure 1**
**The various categories of recombination**. Illustration of the different types of recombinations. Ancestral material is shown as solid red lines, while non-ancestral material is shown as red-dotted lines. Locations of recombinations are shown below and to the left of the recombination event. Type of recombination is indicated with a blue numeral above the event.

**Figure 3**
**Decay of r²**. This figure shows how r²decays as a function of distance for both the SMC and SMC' algorithm and for an exact coalescent model (simulated using ms). Data was simulated for a 2 Mb region and a sample size of n = 20.

See this image and copyright information in PMC

References

1. Kingman JFC. On the genealogy of large populations. J Appl Prob. 1982;19A:27–43.
1. Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popn Biol. 1983;23:183–201. doi: 10.1016/0040-5809(83)90013-8. - DOI - PubMed
1. Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP. Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes. Am J Hum Genet. 2004;75:35–43. doi: 10.1086/422174. - DOI - PMC - PubMed
1. Jiang R, Marjoram P, Stram D. "New data from old" – simulation of test data for mapping studies. 2005.
1. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Research. 2005;15:1576–1583. doi: 10.1101/gr.3709305. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fast "coalescent" simulation

Affiliation

Fast "coalescent" simulation

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources