Optimal phylogenetic reconstruction of insertion and deletion events
- PMID: 38940131
- PMCID: PMC11211827
- DOI: 10.1093/bioinformatics/btae254
Optimal phylogenetic reconstruction of insertion and deletion events
Abstract
Motivation: Insertions and deletions (indels) influence the genetic code in fundamentally distinct ways from substitutions, significantly impacting gene product structure and function. Despite their influence, the evolutionary history of indels is often neglected in phylogenetic tree inference and ancestral sequence reconstruction, hindering efforts to comprehend biological diversity determinants and engineer variants for medical and industrial applications.
Results: We frame determining the optimal history of indel events as a single Mixed-Integer Programming (MIP) problem, across all branch points in a phylogenetic tree adhering to topological constraints, and all sites implied by a given set of aligned, extant sequences. By disentangling the impact on ancestral sequences at each branch point, this approach identifies the minimal indel events that jointly explain the diversity in sequences mapped to the tips of that tree. MIP can recover alternate optimal indel histories, if available. We evaluated MIP for indel inference on a dataset comprising 15 real phylogenetic trees associated with protein families ranging from 165 to 2000 extant sequences, and on 60 synthetic trees at comparable scales of data and reflecting realistic rates of mutation. Across relevant metrics, MIP outperformed alternative parsimony-based approaches and reported the fewest indel events, on par or below their occurrence in synthetic datasets. MIP offers a rational justification for indel patterns in extant sequences; importantly, it uniquely identifies global optima on complex protein data sets without making unrealistic assumptions of independence or evolutionary underpinnings, promising a deeper understanding of molecular evolution and aiding novel protein design.
Availability and implementation: The implementation is available via GitHub at https://github.com/santule/indelmip.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
No competing interest is declared.
Figures







Similar articles
-
Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels.Mol Biol Evol. 2024 Jul 3;41(7):msae109. doi: 10.1093/molbev/msae109. Mol Biol Evol. 2024. PMID: 38842253 Free PMC article.
-
ARPIP: Ancestral Sequence Reconstruction with Insertions and Deletions under the Poisson Indel Process.Syst Biol. 2023 Jun 16;72(2):307-318. doi: 10.1093/sysbio/syac050. Syst Biol. 2023. PMID: 35866991 Free PMC article.
-
Single-character insertion-deletion model preserves long indels in ancestral sequence reconstruction.BMC Bioinformatics. 2024 Dec 2;25(1):370. doi: 10.1186/s12859-024-05986-1. BMC Bioinformatics. 2024. PMID: 39617897 Free PMC article.
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
Small Insertions and Deletions Drive Genomic Plasticity during Adaptive Evolution of Yersinia pestis.Microbiol Spectr. 2022 Jun 29;10(3):e0224221. doi: 10.1128/spectrum.02242-21. Epub 2022 Apr 19. Microbiol Spectr. 2022. PMID: 35438532 Free PMC article. Review.
Cited by
-
The Characterization of Ancient Methanococcales Malate Dehydrogenases Reveals That Strong Thermal Stability Prevents Unfolding Under Intense γ-Irradiation.Mol Biol Evol. 2024 Dec 6;41(12):msae231. doi: 10.1093/molbev/msae231. Mol Biol Evol. 2024. PMID: 39494471 Free PMC article.
-
Algorithms to reconstruct past indels: The deletion-only parsimony problem.PLoS Comput Biol. 2025 Jul 28;21(7):e1012585. doi: 10.1371/journal.pcbi.1012585. eCollection 2025 Jul. PLoS Comput Biol. 2025. PMID: 40720545 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources