Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
- PMID: 33983409
- PMCID: PMC8382905
- DOI: 10.1093/molbev/msab151
Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Keywords: molecular evolution; phylogenetic inference; phylogenetic signal; phylogenomics; systematic biases.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Figures




Similar articles
-
A Total-Evidence Dated Phylogeny of Echinoidea Combining Phylogenomic and Paleontological Data.Syst Biol. 2021 Apr 15;70(3):421-439. doi: 10.1093/sysbio/syaa069. Syst Biol. 2021. PMID: 32882040
-
Accounting for Uncertainty in Gene Tree Estimation: Summary-Coalescent Species Tree Inference in a Challenging Radiation of Australian Lizards.Syst Biol. 2017 May 1;66(3):352-366. doi: 10.1093/sysbio/syw089. Syst Biol. 2017. PMID: 28039387
-
Hierarchical Hybrid Enrichment: Multitiered Genomic Data Collection Across Evolutionary Scales, With Application to Chorus Frogs (Pseudacris).Syst Biol. 2020 Jul 1;69(4):756-773. doi: 10.1093/sysbio/syz074. Syst Biol. 2020. PMID: 31886503 Free PMC article.
-
Phylogenomics and the flowering plant tree of life.J Integr Plant Biol. 2023 Feb;65(2):299-323. doi: 10.1111/jipb.13415. Epub 2022 Dec 31. J Integr Plant Biol. 2023. PMID: 36416284 Review.
-
Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics.Mol Phylogenet Evol. 2016 Jan;94(Pt A):447-62. doi: 10.1016/j.ympev.2015.10.027. Epub 2015 Oct 27. Mol Phylogenet Evol. 2016. PMID: 26518740 Review.
Cited by
-
Comparative genomics unravels a rich set of biosynthetic gene clusters with distinct evolutionary trajectories across fungal species (Termitomyces) farmed by termites.Commun Biol. 2024 Oct 5;7(1):1269. doi: 10.1038/s42003-024-06887-y. Commun Biol. 2024. PMID: 39369058 Free PMC article.
-
Dynamic evolution of locomotor performance independent of changes in extended phenotype use in spiders.Proc Biol Sci. 2023 Oct 25;290(2009):20232035. doi: 10.1098/rspb.2023.2035. Epub 2023 Oct 25. Proc Biol Sci. 2023. PMID: 37876190 Free PMC article.
-
Echoes of ancient introgression punctuate stable genomic lineages in the evolution of figs.Proc Natl Acad Sci U S A. 2023 Jul 11;120(28):e2222035120. doi: 10.1073/pnas.2222035120. Epub 2023 Jul 3. Proc Natl Acad Sci U S A. 2023. PMID: 37399402 Free PMC article.
-
Phylogenomic analyses shed light on the relationships of chiton superfamilies and shell-eye evolution.Mar Life Sci Technol. 2023 Nov 17;5(4):525-537. doi: 10.1007/s42995-023-00207-9. eCollection 2023 Nov. Mar Life Sci Technol. 2023. PMID: 38045544 Free PMC article.
-
Phylogenomic resolution of the root of Panpulmonata, a hyperdiverse radiation of gastropods: new insight into the evolution of air breathing.Proc Biol Sci. 2022 Apr 13;289(1972):20211855. doi: 10.1098/rspb.2021.1855. Epub 2022 Apr 6. Proc Biol Sci. 2022. PMID: 35382597 Free PMC article.
References
-
- Aguileta G, Marthey S, Chiapello H, Lebrun M-H, Rodolphe F, Fournier E, Gendrault-Jacquemard A, Giraud T.. 2008. Assessing the performance of single-copy genes for recovering robust phylogenies. Syst Biol. 57(4):613–627. - PubMed
-
- Alda F, Tagliacollo VA, Bernt MJ, Waltz BT, Ludt WB, Faircloth BC, Alfaro ME, Albert JS, Chakrabarty P.. 2019. Resolving deep nodes in an ancient radiation of neotropical fishes in the presence of conflicting signals from incomplete lineage sorting. Syst Biol. 68(4):573–593. - PubMed
-
- Arcila D, Ortí G, Vari R, Armbruster JW, Stiassny ML, Ko KD, Sabaj MH, Lundberg J, Revell LJ, Betancur-R R.. 2017. Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life. Nat Ecol Evol. 1(2):20–10. - PubMed
Publication types
MeSH terms
Associated data
LinkOut - more resources
Full Text Sources
Other Literature Sources