Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Aug 6;99(16):10516-21.
doi: 10.1073/pnas.162224399. Epub 2002 Jul 25.

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Affiliations

The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation

Alan R Lemmon et al. Proc Natl Acad Sci U S A. .

Abstract

Large phylogeny estimation is a combinatorial optimization problem that no future computer will ever be able to solve exactly in practical computing time. The difficulty of the problem is amplified by the need to use complex evolutionary models and large taxon samplings. Hence, many heuristic approaches have been developed, with varying degrees of success. Here, we report on a heuristic approach, the metapopulation genetic algorithm, involving several populations of trees that are forced to cooperate in the search for the optimal tree. Within each population, trees are subjected to evaluation, selection, and mutation events, which are directed by using inter-population consensus information. The method proves to be both very accurate and vastly faster than existing heuristics, such that data sets comprised of hundreds of taxa can be analyzed in practical computing times under complex models of maximum-likelihood evolution. Branch support values produced by the metapopulation genetic algorithm might closely approximate the posterior probabilities of the corresponding branches.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
The principle of CP. Before a tree is subjected to mutation, its topology is compared with those of the best tree(s) from other populations; the consensus branches (indicated in bold red) define the partitions that can (green arrows) and cannot (red arrows) be affected by topological mutations; i.e., any operation moving a taxon across a consensus branch is prohibited. The method used to define a consensus branch depends on the CP option chosen by the user (see text for details). TXS, taxa swap. See supporting Materials and Methods for details.
Fig 2.
Fig 2.
(a) Relative run times vs. number of taxa for the GA (one population), metaGA (with strict group consensus among four populations) and StepAdd algorithm (9) as well as for a single round of NNI, SPR, and TBR swapping (dotted lines), under the JC (blue) and HKY + rate heterogeneity (red) ML models. Standard errors (among 10 runs) are indicated for the GA and metaGA only. (b) Ratio between StepAdd and average metaGA run times (vs. number of taxa) under the JC (Lower) and HKY + rate heterogeneity (Upper) ML models.
Fig 3.
Fig 3.
(a) Unoptimized (red dots) and optimized (green dots) score vs. time for GA (one population) and metaGA (strict CP with four populations of four individuals each) runs (JC model, 80 taxa). The asterisks indicate when the stopping rule has been reached. (b) Score of best tree (320 taxa) for each of four populations run in parallel (no CP). At the time indicated by the red arrow, one population (red) was allowed to use the consensus information from the three other populations (black). This process demonstrates that scores increase faster when CP is enabled. (c) Number of consensus partitions among four independent 80-taxa populations running under the GA (JC model) vs. number of consensus partitions among four 80-taxa populations interacting through CP during a metaGA run.
Fig 4.
Fig 4.
(a) Run times (JC model, 160 taxa) for a one-population GA search (dotted circle; ± SE indicated) as well as for two-, three-, four-, six-, eight-, and 16-population metaGA searches under each of the CP options (10 runs each, SE too small for being visible). The vertical arrows delimit the range in number of populations for which a metaGA run (depending on the CP option) can take the same time than a one-population GA search. Run time increases polynomially with the number of populations; equation of the regression curve is indicated for the strict and ring options only. (b) Run time (JC model, 160 taxa) vs. percent error for two-, three-, four-, six-, eight-, and 16-population metaGA searches under each of the five CP options. Coordinates of the one-population GA run are indicated. Probability and strict consensus options find excellent topologies as soon as the metaGA is run with more than three populations. The probability option is therefore optimal as it is significantly faster than the strict consensus option.
Fig 5.
Fig 5.
MetaGA branch support values (10 runs with probability CP among 10 populations; each dot therefore indicates the number of trees, among the 100 best trees produced, exhibiting a given partition) vs. (a) bootstrap values (100 replicates; random starting tree, SPR branch swapping) and (b) mrbayes posterior probabilities of partitions. A 55-taxa rbcL data set was analyzed under the JC model.

References

    1. Cook S., (1971) Proceedings of the 3rd Annual Symposium on the Theory of Computing (Association for Computing Machinery, New York), pp. 151–158.
    1. Graham R. L. & Foulds, L. R. (1982) Math. Biosci. 60, 133-142.
    1. Hillis D. M. (1996) Nature (London) 383, 130-131. - PubMed
    1. Huelsenbeck J. P. (1995) Syst. Biol. 44, 17-48.
    1. Felsenstein J. (1981) J. Mol. Evol. 17, 368-376. - PubMed

Publication types

LinkOut - more resources