The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Alan R Lemmon¹, Michel C Milinkovitch

Affiliations

Affiliation

¹ Laboratory of Evolutionary Genetics, Free University of Brussels (ULB), Cp 300, Institute of Molecular Biology and Medicine, Rue Jeener and Brachet 12, B-6041 Gosselies, Belgium.

PMID: 12142465
PMCID: PMC124960
DOI: 10.1073/pnas.162224399

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Alan R Lemmon et al. Proc Natl Acad Sci U S A. 2002.

. 2002 Aug 6;99(16):10516-21.

doi: 10.1073/pnas.162224399. Epub 2002 Jul 25.

Authors

Alan R Lemmon¹, Michel C Milinkovitch

Affiliation

¹ Laboratory of Evolutionary Genetics, Free University of Brussels (ULB), Cp 300, Institute of Molecular Biology and Medicine, Rue Jeener and Brachet 12, B-6041 Gosselies, Belgium.

PMID: 12142465
PMCID: PMC124960
DOI: 10.1073/pnas.162224399

Abstract

Large phylogeny estimation is a combinatorial optimization problem that no future computer will ever be able to solve exactly in practical computing time. The difficulty of the problem is amplified by the need to use complex evolutionary models and large taxon samplings. Hence, many heuristic approaches have been developed, with varying degrees of success. Here, we report on a heuristic approach, the metapopulation genetic algorithm, involving several populations of trees that are forced to cooperate in the search for the optimal tree. Within each population, trees are subjected to evaluation, selection, and mutation events, which are directed by using inter-population consensus information. The method proves to be both very accurate and vastly faster than existing heuristics, such that data sets comprised of hundreds of taxa can be analyzed in practical computing times under complex models of maximum-likelihood evolution. Branch support values produced by the metapopulation genetic algorithm might closely approximate the posterior probabilities of the corresponding branches.

PubMed Disclaimer

Figures

**Fig 1.**
The principle of CP. Before a tree is subjected to mutation, its topology is compared with those of the best tree(s) from other populations; the consensus branches (indicated in bold red) define the partitions that can (green arrows) and cannot (red arrows) be affected by topological mutations; i.e., any operation moving a taxon across a consensus branch is prohibited. The method used to define a consensus branch depends on the CP option chosen by the user (see text for details). TXS, taxa swap. See supporting *Materials and Methods* for details.

**Fig 2.**
(a) Relative run times vs. number of taxa for the GA (one population), metaGA (with strict group consensus among four populations) and StepAdd algorithm (9) as well as for a single round of NNI, SPR, and TBR swapping (dotted lines), under the JC (blue) and HKY + rate heterogeneity (red) ML models. Standard errors (among 10 runs) are indicated for the GA and metaGA only. (b) Ratio between StepAdd and average metaGA run times (vs. number of taxa) under the JC (*Lower*) and HKY + rate heterogeneity (*Upper*) ML models.

**Fig 3.**
(a) Unoptimized (red dots) and optimized (green dots) score vs. time for GA (one population) and metaGA (strict CP with four populations of four individuals each) runs (JC model, 80 taxa). The asterisks indicate when the stopping rule has been reached. (b) Score of best tree (320 taxa) for each of four populations run in parallel (no CP). At the time indicated by the red arrow, one population (red) was allowed to use the consensus information from the three other populations (black). This process demonstrates that scores increase faster when CP is enabled. (c) Number of consensus partitions among four independent 80-taxa populations running under the GA (JC model) vs. number of consensus partitions among four 80-taxa populations interacting through CP during a metaGA run.

**Fig 4.**
(a) Run times (JC model, 160 taxa) for a one-population GA search (dotted circle; ± SE indicated) as well as for two-, three-, four-, six-, eight-, and 16-population metaGA searches under each of the CP options (10 runs each, SE too small for being visible). The vertical arrows delimit the range in number of populations for which a metaGA run (depending on the CP option) can take the same time than a one-population GA search. Run time increases polynomially with the number of populations; equation of the regression curve is indicated for the strict and ring options only. (b) Run time (JC model, 160 taxa) vs. percent error for two-, three-, four-, six-, eight-, and 16-population metaGA searches under each of the five CP options. Coordinates of the one-population GA run are indicated. Probability and strict consensus options find excellent topologies as soon as the metaGA is run with more than three populations. The probability option is therefore optimal as it is significantly faster than the strict consensus option.

**Fig 5.**
MetaGA branch support values (10 runs with probability CP among 10 populations; each dot therefore indicates the number of trees, among the 100 best trees produced, exhibiting a given partition) vs. (a) bootstrap values (100 replicates; random starting tree, SPR branch swapping) and (b) mrbayes posterior probabilities of partitions. A 55-taxa *rbc*L data set was analyzed under the JC model.

See this image and copyright information in PMC

References

1. Cook S., (1971) Proceedings of the 3rd Annual Symposium on the Theory of Computing (Association for Computing Machinery, New York), pp. 151–158.
1. Graham R. L. & Foulds, L. R. (1982) Math. Biosci. 60, 133-142.
1. Hillis D. M. (1996) Nature (London) 383, 130-131. - PubMed
1. Huelsenbeck J. P. (1995) Syst. Biol. 44, 17-48.
1. Felsenstein J. (1981) J. Mol. Evol. 17, 368-376. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Affiliation

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources