Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 19;72(1):242-248.
doi: 10.1093/sysbio/syad002.

Lagrange-NG: The next generation of Lagrange

Affiliations

Lagrange-NG: The next generation of Lagrange

Ben Bettisworth et al. Syst Biol. .

Abstract

Computing ancestral ranges via the Dispersion Extinction and Cladogensis (DEC) model of biogeography is characterized by an exponential number of states relative to the number of regions considered. This is because the DEC model requires computing a large matrix exponential, which typically accounts for up to 80% of overall runtime. Therefore, the kinds of biogeographical analyses that can be conducted under the DEC model are limited by the number of regions under consideration. In this work, we present a completely redesigned efficient version of the popular tool Lagrange which is up to 49 times faster with multithreading enabled, and is also 26 times faster when using only one thread. We call this new version Lagrange-NG (Lagrange-Next Generation). The increased computational efficiency allows Lagrange-NG to analyze datasets with a large number of regions in a reasonable amount of time, up to 12 regions in approximately 18 min. We achieve these speedups using a relatively new method of computing the matrix exponential based on Krylov subspaces. In order to validate the correctness of Lagrange-NG, we also introduce a novel metric on range distributions for trees so that researchers can assess the difference between any two range inferences. Finally, Lagrange-NG exhibits substantially higher adherence to coding quality standards. It improves a respective software quality indicator as implemented in the SoftWipe tool from average (5.5; Lagrange) to high (7.8; Lagrange-NG). Lagrange-NG is freely available under GPL2. [Biogeography; Phylogenetics; DEC Model.].

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparison of runtimes between Lagrange (left) and Lagrange-NG (right) with sequential Lagrange-NG (top) and parallel Lagrange-NG using 8 cores (bot). Results were obtained by generating 100 random datasets. Note that the original Lagrange was not run with any multi-threading, as it does not support it. Instead, the data has been replicated for comparison’s sake. Times are in seconds. The figure was generated using seaborn (Waskom, 2021).
Figure 2.
Figure 2.
Runtimes for Lagrange-NG on a larger number of regions when using 8 cores. Results were obtained by generated random datasets with 100 taxa and 8, 9, 10, 11, or 12 regions. We generated 100 random datasets for the 8 region cases; for the 9 and 10 region cases we generated 30 datasets; and for the 11 and 12 region cases we generated 10 datasets.
Figure 3.
Figure 3.
Parallel efficiency plot for a datasets with 500 taxa and 6 regions. Please notice the log-log scaling. The actual values plotted are 1.7, 1.8, 2.0, 2.1 for 4, 8, 16, and 32 cores, respectively. The ratio of the realized speedup to the optimal speedup is 0.43, 0.24, 0.13, 0.06 for 4, 8, 16, and 32 cores, respectively.
Figure 4.
Figure 4.
Plot showing the pairwise Wasserstein metric between the “basis” distributions on three regions: A, B, and C. Here, the “basis” distributions are a set of distributions, one for each state, with 1.0 in the corresponding entry for that state. Distributions are labeled by which entry contains the 1.0. States are ordered by a Gray code for aesthetic reasons only.

References

    1. Evans S.N., Matsen F.A.. 2012. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74:569–592. - PMC - PubMed
    1. Intel Math Kernel Library. 2022. Developer Reference for Intel® oneAPI Math Kernel Library - C.
    1. Izquierdo-Carrasco F., Alachiotis N., Berger S., Flouri T., Pissis S. P., Stamatakis A.. 2013. A Generic Vectorization Scheme and a GPU Kernel for the Phylogenetic Likelihood Library. Pages 530–538in 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum.
    1. Johnson S. G. 2021. The nlopt nonlinear-optimization package. https://nlopt.readthedocs.io/en/latest/Citing_NLopt/.
    1. Landis M.J., Matzke N.J., Moore B.R., Huelsenbeck J.P.. 2013. Bayesian Analysis of Biogeography when the Number of Areas is Large. Syst. Biol. 62:789–804. - PMC - PubMed

Publication types