Newly Developed Structure-Based Methods Do Not Outperform Standard Sequence-Based Methods for Large-Scale Phylogenomics
- PMID: 40580945
- PMCID: PMC12290511
- DOI: 10.1093/molbev/msaf149
Newly Developed Structure-Based Methods Do Not Outperform Standard Sequence-Based Methods for Large-Scale Phylogenomics
Abstract
Recent developments in protein structure prediction have allowed the use of this previously limited source of information at genome-wide scales. It has been proposed that the use of structural information may offer advantages over sequences in phylogenetic reconstruction, due to their slower rate of evolution and direct correlation to function. Here, we examined how recently developed methods for structure-based homology search and tree reconstruction compare with current state-of-the-art sequence-based methods in reconstructing genome-wide collections of gene phylogenies (i.e. phylomes). While structure-based methods can be useful in specific scenarios, we found that their current performance does not justify using the newly developed structure-based methods as a default choice in large-scale phylogenetic studies. On the one hand, the best performing sequence-based tree reconstruction methods still outperform structure-based methods for this task. On the other hand, structure-based homology detection methods provide larger lists of candidate homologs, as previously reported. However, this comes at the expense of missing hits identified by sequence-based methods, as well as providing sets of homolog candidates with higher fractions of false positives. These insights help to guide the use of structural data in comparative genomics and highlight the need to continue improving structure-based approaches. Our pipeline is fully reproducible and has been implemented in a Snakemake workflow. This will facilitate a continuous assessment of future improvements of structure-based tools in the AlphaFold era.
Keywords: homology; orthology; phylogenetics; phylome; structural phylogenetics.
© The Author(s) 2025. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
Figures
References
-
- Edgar RC. Sequence alignment using large protein structure alphabets improves sensitivity to remote homologs. bioRxiv 2024.05.24.595840. 10.1101/2024.05.24.595840, 9 June 2024, preprint: not peer reviewed. - DOI
MeSH terms
Grants and funding
- LCF/BQ/DI22/11940014/"Caixa" Foundation
- FJC2021-046869-I/"Caixa" Foundation
- MCIN/AEI/10.13039/501100011033/"Caixa" Foundation
- "European Union" NextGenerationEU/PRTR
- BP 2022, file number BP 00075/Beatriu de Pinós programme
- ID 100010434/"La Caixa" Foundation
- PID2021-126067NB-I00/Spanish Ministry of Science and Innovation
- CPP2021-008552/Spanish Ministry of Science and Innovation
- PCI2022-135066-2/Spanish Ministry of Science and Innovation
- PDC2022-133266-I00/Spanish Ministry of Science and Innovation
- SGR01551/Catalan Research Agency (AGAUR)
- ERC-2016-724173/European Union's Horizon 2020 research and innovation programme
- GBMF9742/Gordon and Betty Moore Foundation
- LCF/PR/HR21/00737/"La Caixa" foundation
- IMP/00019/Instituto de Salud Carlos III
- CIBERINFEC CB21/13/00061-ISCIII-SGEFI/ERDF/Instituto de Salud Carlos III
LinkOut - more resources
Full Text Sources
