Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Jul:122:57-66.
doi: 10.1016/j.tpb.2017.05.002. Epub 2017 Jul 11.

An efficient algorithm for generating the internal branches of a Kingman coalescent

Affiliations
Comparative Study

An efficient algorithm for generating the internal branches of a Kingman coalescent

M Reppell et al. Theor Popul Biol. 2018 Jul.

Abstract

Coalescent simulations are a widely used approach for simulating sample genealogies, but can become computationally burdensome in large samples. Methods exist to analytically calculate a sample's expected frequency spectrum without simulating full genealogies. However, statistics that rely on the distribution of the length of internal coalescent branches, such as the probability that two mutations of equal size arose on the same genealogical branch, have previously required full coalescent simulations to estimate. Here, we present a sampling method capable of efficiently generating limited portions of sample genealogies using a series of analytic equations that give probabilities for the number, start, and end of internal branches conditional on the number of final samples they subtend. These equations are independent of the coalescent waiting times and need only be calculated a single time, lending themselves to efficient computation. We compare our method with full coalescent simulations to show the resulting distribution of branch lengths and summary statistics are equivalent, but that for many conditions our method is at least 10 times faster.

Keywords: Coalescent; Coalescent simulations; Genealogical topology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Density functions for the distribution of genealogical branch lengths with sizes between 2 and 10 for a sample of size 50
Figure 2
Figure 2
The exact probabilities of observing a given number of branches of a given size in the genealogy of a sample with size 50.
Figure 3
Figure 3
Topology free sampling. After calculating and storing intermediate topological probabilities, it is possible to efficiently generate segments of genealogies, and relevant summary statistics, using Monte Carlo integration.
Figure 4
Figure 4
(A) Cumulative distribution of branch lengths for branches of size 3, 5, and 7 in a genealogy from a sample with size 1,000. The curves give the probability of observing a branch shorter in length than the value given along the x-axis, with each color representing a different branch size. The double-dashed lines represents lengths drawn from 50,000 full coalescent simulations, the dashed lines from 50,000 genealogies sampled according to the topology free approach presented in the methods section, and the solid line from 50,000 genealogies sampled using expected rather than random waiting times. The results of the three approaches are so similar they are indistinguishable from the plots. (B) The number of branches with a size of 3, 5, or 7 in a genealogy from a sample with size 1,000. The solid black lines give the analytical values from formula (10) and the red bars were observed values from 50,000 genealogies realized with full coalescent simulations.
Figure 5
Figure 5
The sampling of individual branches allows us to calculate summary statistics with our topology free approach that are not available from the summed total length. (A) For sample sizes between 750 and 100 the variance between branch lengths was very similar between our topology free approach and full coalescent simulations. (B) In smaller sample sizes, larger size branches had higher inter-branch variance in coalescent simulations, due to the negative correlation between their lengths. (C) In larger samples the probability that two mutations of the same size arose on the same branch is nearly identical between coalescent simulations and topology free sampling. (D) In smaller sample sizes, the use of topology free sampling resulted in very similar, yet slightly lower, probability estimates than full coalescent simulations. This reflects the same underlying causes as seen in panel B, where our assumption that the number of branches is independent of their lengths and the allowing of shared coalescent events leading to lower inter-branch variance.

References

    1. Berndt S, Gustafsson S, Mägi R, Ganna A, Wheeler E, Feitosa M, Justice A, Monda K, Croteau-Chonka D, Day F, Esko T, Fall T, Ferreira T, Gentilini D, Jackson A, Luan J, Randall J, Vedantam S, Willer C, Winkler T, Wood A, Workalemahu T, Hu Y, Lee S, Liang L, Lin D, Min J, Neale B, Thorleifsson G, Yang J, Albrecht E, Amin N, Bragg-Gresham J, Cadby G, den Heijer M, Eklund N, Fischer K, Goel A, Hottenga J, Huffman J, Jarick I, Johansson A, Johnson T, Kanoni S, Kleber M, König I, Kristiansson K, Kutalik Z, Lamina C, Lecoeur C, Li G, Mangino M, McArdle W, Medina-Gomez C, Müller-Nurasyid M, Ngwa J, Nolte I, Paternoster L, Pechlivanis S, Perola M, Peters M, Preuss M, Rose L, Shi J, Shungin D, Smith A, Strawbridge R, Surakka I, Teumer A, Trip M, Tyrer J, Van Vliet-Ostaptchouk J, Vandenput L, Waite L, Zhao J, Absher D, Asselbergs F, Atalay M, Attwood A, Balmforth A, Basart H, Beilby J, Bonnycastle L, Brambilla P, Bruinenberg M, Campbell H, Chasman D, Chines P, Collins F, Connell J, Cookson W, de Faire U, de Vegt F, Dei M, Dimitriou M, Edkins S, Estrada K, Evans D, Farrall M, Ferrario M, Ferriéres J, Franke L, Frau F, Gejman P, Grallert H, Grönberg H, Gudnason V, Hall A, Hall P, Hartikainen A, Hayward C, Heard-Costa N, Heath A, Hebebrand J, Homuth G, Hu F, Hunt S, Hyppönen E, Iribarren C, Jacobs K, Jansson J, Jula A, Kähönen M, Kathiresan S, Kee F, Khaw K, Kivimäki M, Koenig W, Kraja A, Kumari M, Kuulasmaa K, Kuusisto J, Laitinen J, Lakka T, Langenberg C, Launer L, Lind L, Lindström J, Liu J, Liuzzi A, Lokki M, Lorentzon M, Madden P, Magnusson P, Manunta P, Marek D, März W, Mateo Leach I, McKnight B, Medland S, Mihailov E, Milani L, Montgomery GVM, Mühleisen T, Munroe P, Musk A, Narisu N, Navis G, Nicholson G, Nohr E, Ong K, Oostra B, Palmer C, Palotie A, Peden J, Ped-ersen N, Peters A, Polasek O, Pouta A, Pramstaller P, Prokopenko I, Pütter C, Radhakrishnan A, Raitakari O, Rendon A, Rivadeneira F, Rudan I, Saaristo T, Sambrook J, Sanders A, Sanna S, Saramies J, Schipf S, Schreiber S, Schunkert H, Shin S, Signorini S, Sinisalo J, Skrobek B, Soranzo N, Stančáková A, Stark K, Stephens J, Stirrups K, Stolk R, Stumvoll M, Swift A, Theodoraki E, Thorand B, Tregouet D, Tremoli E, Van der Klauw M, van Meurs J, Vermeulen S, Viikari J, Virtamo J, Vitart V, Waeber G, Wang Z, Widèn E, Wild S, Willemsen G, Winkelmann B, Witteman J, Wolffenbuttel B, Wong A, Wright A, Zillikens M, Amouyel P, Boehm B, Boerwinkle E, Boomsma D, Caulfield M, Chanock S, Cupples L, Cusi D, Dedoussis G, Erdmann J, Eriksson J, Franks P, Froguel P, Gieger C, Gyllensten U, Hamsten A, Harris T, Hengstenberg C, Hicks A, Hingorani A, Hinney A, Hofman A, Hovingh K, Hveem K, Illig T, Jarvelin M, Jöckel K, Keinanen-Kiukaanniemi S, Kiemeney L, Kuh D, Laakso M, Lehtimäki T, Levinson D, Martin N, Metspalu A, Morris A, Nieminen M, Njølstad I, Ohlsson C, Oldehinkel A, Ouwehand W, Palmer L, Penninx B, Power C, Province M, Psaty B, Qi L, Rauramaa R, Ridker P, Ripatti S, Salomaa V, Samani N, Snieder H, Sørensen T, Spector T, Stefansson K, Tönjes A, Tuomilehto J, Uitterlinden A, Uusitupa M, van der Harst P, Vollenweider P, Wallaschofski H, Wareham N, Watkins H, Wichmann H, Wilson J, Abecasis G, Assimes T, Barroso I, Boehnke M, Borecki I, Deloukas P, Fox C, Frayling T, Groop L, Haritunian T, Heid I, Hunter D, Kaplan R, Karpe F, Moffatt M, Mohlke K, O’Connell J, Pawitan Y, Schadt E, Schlessinger D, Steinthorsdottir V, Strachan D, Thorsteinsdottir U, Visscher P, Di Blasio A, Hirschhorn J, Lindgren C, Morris A, Meyre D, Scherag A, McCarthy M, Speliotes E, North K, Loos R, Ingelsson E. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat Genet. 2013;45:501–12. - PMC - PubMed
    1. Blum MG, Rosenberg NA. Estimating the number of ancestral lineages using a maximum-likelihood method based on rejection sampling. Genetics. 2007;176:1741–1757. - PMC - PubMed
    1. Coventry A, Bull-Otterson L, Liu X, Clark A, Maxwell T, Crosby J, Hixson J, Rea T, Muzny D, Lewis L, Wheeler D, Sabo A, Lusk C, Weiss K, Akbar H, Cree A, Hawes A, Newsham I, Varghese R, Villasana D, Gross S, Joshi V, Santibanez J, Morgan M, Chang K, Hale W, IV, Templeton A, Boerwinkle E, Gibbs R, Sing C. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications. 2010;1:131. - PMC - PubMed
    1. Dahmer I, Kersting G. The internal branch lengths of the Kingman coalescent. Ann Appl Probab. 2015:1325–1348.
    1. Dhersin J, Mölhe M. On the external branches of coalescents with multiple collisions. Electron J Probab. 2013;18(40):11.

Publication types

LinkOut - more resources