Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1;35(1):80-93.
doi: 10.1093/molbev/msx268.

Signal, Uncertainty, and Conflict in Phylogenomic Data for a Diverse Lineage of Microbial Eukaryotes (Diatoms, Bacillariophyta)

Affiliations

Signal, Uncertainty, and Conflict in Phylogenomic Data for a Diverse Lineage of Microbial Eukaryotes (Diatoms, Bacillariophyta)

Matthew B Parks et al. Mol Biol Evol. .

Abstract

Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available.

Keywords: Bacillariophyta; diatoms; incomplete lineage sorting; phylogenomics; phylotranscriptomics.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Clustering of species trees based on Robinson–Foulds symmetric distance for all data treatments (see table 1) using ASTRAL and ASTRAL-MLBS (circles) or IQ-TREE analysis of a concatenated matrix (squares). Most species trees fall within cluster 1, including the tree shown in figure 2. Cluster 1 also includes concatenation-based species trees with 10–20% taxon occupancy at alignment column occupancy cutoffs of 0.8 (a), 0.5 (b), and 0.2 (c). Outlying clusters 2 and 3 represent ASTRAL and ASTRAL-mlbs species trees for 10–20% taxon occupancy partitions and alignment column occupancy cutoffs of 0.2 and 0.5 (cluster 2; four trees total, two are topologically identical) and 0.8 (cluster 3; two topologically identical trees).
<sc>Fig</sc>. 2.
Fig. 2.
Concatenation-based cladogram with gene-tree concordance pie charts (left) and phylogram (right) using the 80–100% taxon occupancy and 0.8 alignment column occupancy data subset. Pie chart color coding: blue—fraction of gene trees supporting the shown split; green—fraction of gene trees supporting the second most common split; red—fraction of gene trees supporting all other alternative partitions; gray—fraction of gene trees with <33% bootstrap support at that node. Support values are only shown for nodes with less than full support from ASTRAL/ASTRAL-MLBS/IQ-TREE SH-aLRT/IQ-TREE ultrafast bootstrapping analyses. Asterisks (*) identify splits not supported by ASTRAL or ASTRAL-MLBS analyses. Nodes labeled A, B, C, and D varied topologically among data treatments and analyses and are discussed throughout the main text. Clades that showed variable phylogenetic placements are also identified.
<sc>Fig</sc>. 3.
Fig. 3.
The three most commonly recovered topologies for major polar centric diatom clades. The four large boxes correspond to the four different taxon occupancy treatments, and each column represents a different column occupancy treatment (table 1). Each data treatment was analyzed with ASTRAL (top row), ASTRAL-MLBS (middle row), or IQ-TREE with a concatenated matrix (bottom row). The recovered topology for each analysis is identified by the shading or stippling according to the top panel. Empty boxes correspond to three other minority topologies (see supplementary file 2, Supplementary Material online, for full results).
<sc>Fig</sc>. 4.
Fig. 4.
Gene tree concordance and discordance across all nodes in the species tree depicted in figure 2 in relation to bootstrap support and branch length. (a) Box plots summarizing the proportion of gene trees still resolved as supporting or conflicting nodes within the species tree, shown as points, when bootstrap support cutoff for the gene tree is increased from 33% to 70% (results shown for 0.8 column occupancy data sets). Shared numbers above whiskers indicate no significant difference at P < 0.05. (b) IQ-TREE branch length versus proportion of gene trees that support or conflict with a node (results shown for 80–100% taxon occupancy and 0.8 column occupancy data set). Each pair of points (triangle + circle) represents the proportion of gene trees supporting or conflicting, respectively, with a node on the species tree. A split in a gene tree was considered concordant if it was shared with the species tree and had ≥70% bootstrap support in the gene tree. Splits in a gene tree were considered discordant if they had ≥70% bootstrap support and were not shared with the species tree. The inset shows the nodes labeled in figure 2, with other data points removed for clarity; shaded areas delimit 95% confidence intervals.
<sc>Fig</sc>. 5.
Fig. 5.
Relationships between taxon occupancy and proportion of gene trees that were identified as concordant or discordant with the species tree shown in figure 2. (a) High-taxon-occupancy data sets have a greater proportion of gene trees that are concordant with the species tree. (b) The proportion of gene trees in conflict with the species tree is relatively invariable across varying levels of taxon occupancy. For both panels, each line corresponds to a split in the species tree at different levels of taxon occupancy. The four focal nodes identified in figure 2 are likewise identified in panel (b), but all of them have uniformly low gene-tree support and so fall below the dashed line in panel (a). A split in a gene tree was considered concordant if it was shared with the species tree and had ≥70% bootstrap support in the gene tree. Splits in a gene tree were considered discordant if they had ≥70% bootstrap support and were not shared with the species tree. Shared numbers above whiskers indicate no significant difference in mean values at P < 0.05.
<sc>Fig</sc>. 6.
Fig. 6.
Genewise log-likelihood differences for competing relationships of polar centric diatom clades and for the placement of Attheya. (a) Most commonly recovered versus the second most recovered polar centric topology; (b) most commonly recovered versus the third most recovered polar centric topology; and (c) (pennates + Attheya) versus (polar centrics + Attheya).

Similar articles

Cited by

References

    1. Andrade SCS, Montenegro H, Strand M, Schwartz ML, Kajihara H, Norenburg JL, Turbeville JM, Sundberg P, Giribet G.. 2014. A transcriptomic approach to ribbon worm systematics (Nemertea): resolving the Pilidiophora problem. Mol Biol Evol. 31(12):3206–3215. - PubMed
    1. Anisimova M, Gil M, Dufayard JF, Dessimoz C, Gascuel O.. 2011. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 60(5):685–699. - PMC - PubMed
    1. Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, et al.2004. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306(5693):79–86. - PubMed
    1. Ashworth MP, Ruck EC, Lobban CS, Romanovicz DK, Theriot EC.. 2012. A revision of the genus Cyclophora and description of Astrosyne gen. nov. (Bacillariophyta), two genera with the pyrenoids contained within pseudosepta. Phycologia 51(6):684–699.
    1. Blom MPK, Bragg JG, Potter S, Moritz C.. 2017. Accounting for uncertainty in gene tree estimation: summary-coalescent species tree inference in a challenging radiation of Australian lizards. Syst Biol. 66:352–366. - PubMed

Publication types

LinkOut - more resources