Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 14:7:e6399.
doi: 10.7717/peerj.6399. eCollection 2019.

Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics

Affiliations

Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics

Gustavo A Bravo et al. PeerJ. .

Abstract

Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.

Keywords: Gene flow; Genome; Multispecies coalescent model; Retroelement; Speciation; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Alexander Schliep and Scott V. Edwards are Academic Editors for PeerJ.

Figures

Figure 1
Figure 1. A posteriori marker selection from whole-genome alignments for phylogenomics and phylogeography.
Whole-genome analysis (A) permits researchers to choose different markers for specific purposes (B–D). By contrast, subsampling methods such as Rad-seq or hybrid capture, which dominate phylogenomics today, usually yield a specific set of markers that the researcher has chosen a priori. The generation of WGA thus greatly increases the use of genomic data in biological research, beyond the initial goals of the researcher producing those data. Here, we show how a hypothetical WGA that includes seven different loci (different colors) for four individuals allows extracting sequence data to generate gene trees (B), identifying SNPs to genotype individuals (C), and measuring copy depth to infer CNVs across genomic regions (D). Ultimately, these different kinds of data can be translated into species tree inferences (B–D). In the case of CNVs, only locus number 3 (orange) shows significant CNV. Because CNVs are measured as continuous characters (i.e., copy depth), the orange shading represents a hypothetical evolutionary scenario of copy number variation of genomic region number 3 within the inferred species tree, which is incongruent with those based on sequence and SNP data from other loci in the genome.
Figure 2
Figure 2. Trends in phylogenomic data sets since the emergence of HTS.
Based on a sample of 164 phylogenomic papers published since 2004 (see Table S1), we observed no increase in the number of species per data set over time (A). On the other hand, there is a significant increase in the number of loci (B), total alignment length (C), and total data set size, as measured by the product of species times locus number (Data set size 1, E) and species times total alignment length (Data set size 2, F). Moreover, the advent of HTS does not support the notion of a tradeoff between the number of species and the number of loci in phylogenomic studies (D).
Figure 3
Figure 3. Some examples of violations of the multispecies coalescent.
In event A, there is gene flow; in event B there is homoploid hybridization; in event C, there is a gene duplication; and in event D, incomplete lineage sorting. All of these processes contribute to gene tree heterogeneity but fall outside the standard multispecies coalescent model. Importantly, all of these processes also yield strictly dichotomous gene trees, whereas recombination (not illustrated here) does not.
Figure 4
Figure 4. Gene duplication and loss (GDL) creates patterns that can mimic incomplete lineage sorting and other processes, leading to spurious inferences of the species history.
Genes and genomes of three species A, B, and C. Multi-colored bars show (parts of) their genomes with a number of loci indicated in different colors. The orange gene is duplicated in species A and it was lost in species B. The blue gene was duplicated before the divergence between species A and the ancestor of species B and C. However, one of these copies was lost in species A, whereas both copies were maintained in species B and C. Reconstruction of the orange gene tree based on extant diversity will yield a wrong inference of its history due to the absence of data for species B. On the other hand, a phylogenetic reconstruction of the blue gene is difficult to predict. Depending on which of the duplicates are sampled for species B and C, different outcomes can be expected regarding the relationship among the three species. The duplication and loss history of these two genes may cause serious issues for phylogenetic reconstruction because no specific pattern can be expected between them.
Figure 5
Figure 5. Complex patterns of gene lineages with polyploidization and interspecific gene flow.
Genes and genomes of four species A, B, C and D. Multi-colored bars show (parts of) genomes with a number of loci indicated in different colors. Two gene trees, one orange and one blue, evolve within the species network. Species B is an allopolyploid containing two genomes.
Figure 6
Figure 6. Gradual speciation, or isolation-with migration.
After starting to split, gene flow between species decreases gradually. Such a gradual decrease in the extent of gene flow between species might present an especially useful extension of the standard multispecies coalescent model. Colors depict different gene pools and their gradual change along branches describes how species gradually differentiate despite the existence of migration over time. Thickness and color intensity of arrows show that gene flow becomes weaker as species gradually isolate.
Figure 7
Figure 7. Two possible species phylogenies producing similar observations at present time.
(A) species tree with gene flow. (B) Species network with homoploid hybridization. Distinguishing two such scenarios usually requires simulations and comparison of observed and expected summary statistics.

Similar articles

Cited by

References

    1. Aberer AJ, Krompass D, Stamatakis A. Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Systematic Biology. 2013;62(1):162–166. doi: 10.1093/sysbio/sys078. - DOI - PMC - PubMed
    1. Adamczak R, Miloś P. U-statistics of Ornstein-Uhlenbeck branching particle system. Journal of Theoretical Probability. 2014;27(4):1071–1111. doi: 10.1007/s10959-013-0503-2. - DOI
    1. Adamczak R, Miloś P. CLT for Ornstein-Uhlenbeck branching particle system. Electronic Journal of Probability. 2015;20:1–35. doi: 10.1214/EJP.v20-4233. - DOI
    1. Andermann T, Fernandes AM, Olsson U, Topel M, Pfeil B, Oxelman B, Aleixo A, Faircloth BC, Antonelli A. Allele phasing greatly improves the phylogenetic utility of ultraconserved elements. Systematic Biology. 2018;68(1):32–46. doi: 10.1093/sysbio/syy039. - DOI - PMC - PubMed
    1. Ané C. Analysis of comparative data with hierarchical autocorrelation. Annals of Applied Statistics. 2008;2(3):1078–1102. doi: 10.1214/08-AOAS173. - DOI