. 2024 May;629(8013):851-860.

doi: 10.1038/s41586-024-07323-1. Epub 2024 Apr 1.

Complexity of avian evolution revealed by family-level genomes

Josefin Stiller¹, Shaohong Feng^{2

3

4}, Al-Aabid Chowdhury⁵, Iker Rivas-González⁶, David A Duchêne⁷, Qi Fang⁸, Yuan Deng^{8

9}, Alexey Kozlov¹⁰, Alexandros Stamatakis^{10

11

12}, Santiago Claramunt^{13

14}, Jacqueline M T Nguyen^{15

16}, Simon Y W Ho⁵, Brant C Faircloth¹⁷, Julia Haag¹⁰, Peter Houde¹⁸, Joel Cracraft¹⁹, Metin Balaban²⁰, Uyen Mai²¹, Guangji Chen^{9

22}, Rongsheng Gao^{9

22}, Chengran Zhou⁹, Yulong Xie², Zijian Huang², Zhen Cao²³, Zhi Yan²³, Huw A Ogilvie²³, Luay Nakhleh²³, Bent Lindow²⁴, Benoit Morel^{10

11}, Jon Fjeldså²⁴, Peter A Hosner^{24

25}, Rute R da Fonseca²⁵, Bent Petersen^{7

26}, Joseph A Tobias²⁷, Tamás Székely^{28

29}, Jonathan David Kennedy³⁰, Andrew Hart Reeve²⁴, Andras Liker^{31

32}, Martin Stervander³³, Agostinho Antunes^{34

35}, Dieter Thomas Tietze³⁶, Mads F Bertelsen³⁷, Fumin Lei^{38

39}, Carsten Rahbek^{25

30

40

41}, Gary R Graves^{30

42}, Mikkel H Schierup⁶, Tandy Warnow⁴³, Edward L Braun⁴⁴, M Thomas P Gilbert^{7

45}, Erich D Jarvis^{46

47}, Siavash Mirarab⁴⁸, Guojie Zhang^{49

50

51

52}

Affiliations

¹ Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark. josefin.stiller@bio.ku.dk.
² Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
³ Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.
⁴ Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China.
⁵ School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.
⁶ Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
⁷ Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Copenhagen, Denmark.
⁸ BGI Research, Shenzhen, China.
⁹ BGI Research, Wuhan, China.
¹⁰ Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
¹¹ Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece.
¹² Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.
¹³ Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.
¹⁴ Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada.
¹⁵ College of Science and Engineering, Flinders University, Adelaide, South Australia, Australia.
¹⁶ Australian Museum Research Institute, Sydney, New South Wales, Australia.
¹⁷ Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA.
¹⁸ Department of Biology, New Mexico State University, Las Cruces, NM, USA.
¹⁹ Department of Ornithology, American Museum of Natural History, New York, NY, USA.
²⁰ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
²¹ Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
²² College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
²³ Department of Computer Science, Rice University, Houston, TX, USA.
²⁴ Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark.
²⁵ Center for Global Mountain Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
²⁶ Centre of Excellence for Omics-Driven Computational Biodiscovery, Faculty of Applied Sciences, AIMST University, Bedong, Malaysia.
²⁷ Department of Life Sciences, Imperial College London, Silwood Park, Ascot, UK.
²⁸ Milner Centre for Evolution, University of Bath, Bath, UK.
²⁹ ELKH-DE Reproductive Strategies Research Group, University of Debrecen, Debrecen, Hungary.
³⁰ Center for Macroecology, Evolution, and Climate, The Globe Institute, University of Copenhagen, Copenhagen, Denmark.
³¹ HUN-REN-PE Evolutionary Ecology Research Group, University of Pannonia, Veszprém, Hungary.
³² Behavioural Ecology Research Group, Center for Natural Sciences, University of Pannonia, Veszprém, Hungary.
³³ Bird Group, Natural History Museum, Tring, UK.
³⁴ CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal.
³⁵ Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal.
³⁶ NABU, Berlin, Germany.
³⁷ Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Frederiksberg, Denmark.
³⁸ Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
³⁹ College of Life Science, University of Chinese Academy of Sciences, Beijing, China.
⁴⁰ Institute of Ecology, Peking University, Beijing, China.
⁴¹ Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark.
⁴² Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.
⁴³ University of Illinois Urbana-Champaign, Champaign, IL, USA.
⁴⁴ Department of Biology, University of Florida, Gainesville, FL, USA.
⁴⁵ University Museum, NTNU, Trondheim, Norway.
⁴⁶ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
⁴⁷ Howard Hughes Medical Institute, Durham, NC, USA.
⁴⁸ University of California, San Diego, San Diego, CA, USA. smirarabbaygi@ucsd.edu.
⁴⁹ Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China. guojiezhang@zju.edu.cn.
⁵⁰ Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China. guojiezhang@zju.edu.cn.
⁵¹ BGI Research, Wuhan, China. guojiezhang@zju.edu.cn.
⁵² Villum Center for Biodiversity Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark. guojiezhang@zju.edu.cn.

PMID: 38560995
PMCID: PMC11111414
DOI: 10.1038/s41586-024-07323-1

Complexity of avian evolution revealed by family-level genomes

Josefin Stiller et al. Nature. 2024 May.

. 2024 May;629(8013):851-860.

doi: 10.1038/s41586-024-07323-1. Epub 2024 Apr 1.

Authors

Affiliations

¹ Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark. josefin.stiller@bio.ku.dk.
² Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
³ Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.
⁴ Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China.
⁵ School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.
⁶ Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
⁷ Center for Evolutionary Hologenomics, The Globe Institute, University of Copenhagen, Copenhagen, Denmark.
⁸ BGI Research, Shenzhen, China.
⁹ BGI Research, Wuhan, China.
¹⁰ Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
¹¹ Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece.
¹² Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.
¹³ Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.
¹⁴ Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada.
¹⁵ College of Science and Engineering, Flinders University, Adelaide, South Australia, Australia.
¹⁶ Australian Museum Research Institute, Sydney, New South Wales, Australia.
¹⁷ Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA.
¹⁸ Department of Biology, New Mexico State University, Las Cruces, NM, USA.
¹⁹ Department of Ornithology, American Museum of Natural History, New York, NY, USA.
²⁰ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
²¹ Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
²² College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
²³ Department of Computer Science, Rice University, Houston, TX, USA.
²⁴ Natural History Museum Denmark, University of Copenhagen, Copenhagen, Denmark.
²⁵ Center for Global Mountain Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
²⁶ Centre of Excellence for Omics-Driven Computational Biodiscovery, Faculty of Applied Sciences, AIMST University, Bedong, Malaysia.
²⁷ Department of Life Sciences, Imperial College London, Silwood Park, Ascot, UK.
²⁸ Milner Centre for Evolution, University of Bath, Bath, UK.
²⁹ ELKH-DE Reproductive Strategies Research Group, University of Debrecen, Debrecen, Hungary.
³⁰ Center for Macroecology, Evolution, and Climate, The Globe Institute, University of Copenhagen, Copenhagen, Denmark.
³¹ HUN-REN-PE Evolutionary Ecology Research Group, University of Pannonia, Veszprém, Hungary.
³² Behavioural Ecology Research Group, Center for Natural Sciences, University of Pannonia, Veszprém, Hungary.
³³ Bird Group, Natural History Museum, Tring, UK.
³⁴ CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal.
³⁵ Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal.
³⁶ NABU, Berlin, Germany.
³⁷ Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Frederiksberg, Denmark.
³⁸ Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
³⁹ College of Life Science, University of Chinese Academy of Sciences, Beijing, China.
⁴⁰ Institute of Ecology, Peking University, Beijing, China.
⁴¹ Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark.
⁴² Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.
⁴³ University of Illinois Urbana-Champaign, Champaign, IL, USA.
⁴⁴ Department of Biology, University of Florida, Gainesville, FL, USA.
⁴⁵ University Museum, NTNU, Trondheim, Norway.
⁴⁶ Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
⁴⁷ Howard Hughes Medical Institute, Durham, NC, USA.
⁴⁸ University of California, San Diego, San Diego, CA, USA. smirarabbaygi@ucsd.edu.
⁴⁹ Center for Evolutionary & Organismal Biology, Liangzhu Laboratory & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China. guojiezhang@zju.edu.cn.
⁵⁰ Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, China. guojiezhang@zju.edu.cn.
⁵¹ BGI Research, Wuhan, China. guojiezhang@zju.edu.cn.
⁵² Villum Center for Biodiversity Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark. guojiezhang@zju.edu.cn.

PMID: 38560995
PMCID: PMC11111414
DOI: 10.1038/s41586-024-07323-1

Abstract

Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions^1-3. Here we address these issues by analysing the genomes of 363 bird species⁴ (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous-Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous-Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.

PubMed Disclaimer

Conflict of interest statement

M.T.P.G. serves on the Science Advisory Board of Colossal Laboratories & Biosciences. All other authors declare no competing interests.

Figures

**Fig. 1. Relationships and divergence times for 363 bird species based on 63,430 intergenic loci.**
a, Topology simplified to orders with higher clade names following ref. . Numbers on branches represent local posterior probability if below 1. b, Time tree of all species. Grey bars represent 95% credible intervals for age estimation; dots indicate nodes with fossil calibrations; asterisks mark the three branches lacking full support. A tree with tip labels is shown in Extended Data Figs. 2 and 3.

**Fig. 2. Explaining difficult placements.**
a, Gene tree discordance across the backbone of the main tree. Node colours and numbers represent the bar plots of quartet frequencies for three possible resolutions around each branch. b, Uncertainty at the base of Elementaves. Phaethontimorphae + Aequornithes had high local posterior probability (LocalPP), but global bootstrap resampling (GlobalBS) showed support for an alternative placement. Violin plots (points for the species-poor Phaethontiformes) show higher root–tip distances of Phaethontiformes, and particularly for Eurypygiformes, than Aequornithes, which may cause attraction to the long-branched Telluraves. Further, the placement of Opisthocomiformes is the only branch where a null hypothesis (H₀) of a polytomy cannot be refuted. c, Addition of taxa occasionally affects topology and support. Across 41,918 GTs with at least one species from each group, the alternative placement of Afroaves + Accipitriformes had higher quartet support when only a few species were sampled but declined with increasing taxon sampling (left), particularly of Passeriformes. The main topology dominated when 138 or more passerines were sampled (middle, arrow). Support for Telluraves + Elementaves decreased with increasing taxon sampling (right). Source Data

**Fig. 3. Effect of increasing data quantity.**
a–c, Species trees were reconstructed from subsets of GTs (1,000, 2,000, ..., 32,000) of the 63,430 intergenic regions in 50 replicates. a, The addition of loci increases similarity to the main tree (left) and increases the proportion of highly supported nodes (right). b, The main tree, with branches coloured according to the difficulty involved in consistently recovering the clade across subsets. Most branches were consistently obtained with only 1,000 GTs (grey); the remaining 40 branches required more loci. c, Increasing the number of loci decreases the number of possible sister groups. We recorded the number of unique sister groups for each node across subsets. Colours correspond to the difficulty (from b), and shading and number show the frequency, with which the main topology was obtained. The top row illustrates examples of easy nodes. in which the same sister group was consistently recovered with 2,000, 4,000 and 16,000 loci, respectively. The remaining plots show the most difficult nodes, in which multiple sister groups were supported even when 32,000 loci were subsampled. d, Ten selected species trees, data types used in each and the support for all challenging branches (labelled in b). Asterisks indicate relationships in Passeriformes that differ from previous studies. MNO, Malaconotoidea + Neosittidae + Orioloidea; MMNO, Mohouidae + MNO, PP, posterior probability; Q, quartiles. Source Data

**Fig. 4. Phylogenetic signal across the genome.**
a, Protein-coding regions yield more varied species trees when they are subsampled. Each heatmap cell shows the average Robinson–Foulds distance between 1,250 (diagonal, 1,225) pairs of species trees, each built from 2,000 GTs of different data types. Values in parentheses give the same metrics for 8,000 GTs, omitting UCEs with fewer loci. b, Effect of subsetting loci by data type and different metrics. The y axis represents the number of differences to the main tree; the x axis shows two metrics split into four quartiles, from low to high. Phylogenetic informativeness is the proportion of parsimony-informative sites. Clocklikeness is the coefficient of variation in root–tip distances, a measure of branch length heterogeneity. Extended Data Figure 8g shows other metrics. c, Patterns of phylogenomic incongruence along the genome. Using the 94,402 loci binned approximately every 500 kb, lines show Robinson–Foulds (RF) distances to the main tree (top), variance in GC content (middle) and recombination rate (bottom). Horizontal lines indicate genome-wide averages. Source Data

**Fig. 5. Biological implications of the new time tree.**
a, The main tree fits morphological traits well. We measured phylogenetic signal (Pagel’s lambda) for nine traits over 100 replicates and compared the fit based on (1) the main tree, (2) the ref. topology and (3) the main tree with random species sampling to match the sample size used in ref. (one-sided t-test with Bonferroni correction). b, The K–Pg and Palaeogene–Neogene transitions were associated with increased effective population sizes of some lineages. Shown are the midpoint ages of each branch compared with the ratio between its length in time units and in coalescent units, which is proportional to the effective population size of that branch and its generation time. Numbers correspond to selected nodes from Fig. 2a. c, Variations in body mass and relative brain size over time changed in different directions following the K–Pg event. Solid lines indicate mean values and ribbons mark 95% confidence intervals. The dashed parts of the reconstruction (from 25 Ma) indicate possible uncertainty due to the lack of within-family sampling (Extended Data Fig. 11g). d, Substitution rates increased around the K–Pg boundary. Estimated molecular rates for the intergenic regions are plotted against the midpoint age of each branch. Source Data

**Extended Data Fig. 1. Overview of the phylogenomic dataset.**
a, Overview of the datasets by different data types in terms of number of loci and base pairs analyzed. b, Comparison of dataset size to previous studies focused on avian relationships. c, Schematic overview of the extraction of different genomic data types (intergenic regions, exons, UCEs, introns). d, Choice of the length of intergenic loci. To evaluate the impact of locus length of intergenic regions, we used 500 alignments of 10 kb length and extracted subregions of increasing length (0.25 kb to 5 kb) to build gene trees for each. We then calculated the number of well-supported nodes of each locus compared to the next shorter version of the locus. We found that gene tree support increased up to 1 kb length for most loci indicating that phylogenetic signal increased. At lengths greater than 1 kb an increasing number of gene trees had fewer well-supported nodes than at shorter locus lengths (values below 0 in the plot), perhaps due to increasing propensity to include recombinations in a locus. We therefore chose 1 kb as the locus length for our analyses to balance high signal and reduced chance of recombination.

**Extended Data Fig. 2. The main dated tree with tip labels for all groups except Passeriformes.**
Taxonomic orders are annotated to the right of the tree. Colors of the branches follow those used in Fig. 1. The Passeriformes portion of the tree is shown in Extended Data Fig. 3.

**Extended Data Fig. 3. The main dated tree with tip labels for Passeriformes.**
Taxonomic family names are given on the branches. Major clades as discussed in the text are annotated to the right following.

**Extended Data Fig. 4. Overview of topologies for the species trees obtained for different data types.**
Each tree is simplified to taxonomic orders, colors follow those used in Fig. 1. All analyses are coalescent-based species trees obtained from ASTRAL with support being local posterior probabilities, with the exception of the values on the panel showing the topology obtained from concatenated analysis using RAxML-NG with support values resulting from bootstrapping. Poorly supported branches (bootstrap<0.8, local posterior probabilities<0.9) are dashed.

**Extended Data Fig. 5. Comparison of the main tree with previous studies simplified to taxonomic orders.**
Top, comparison to Jarvis et al. ‘TENT’ on the right. Bottom, comparison with Prum et al. on the right. Bands connect the same tips, dashed branches on the right tree indicate nodes not present in the main tree.

**Extended Data Fig. 6. Comparison of inferred ages to previous studies and across alternative analyses.**
a, Age estimates in comparison to previous studies for major clades and orders (left) and for families (right). Shown are median age estimates (points) and 95% credible intervals (whiskers) derived from MCMC sampling for clades that were present in at least two studies. The dashed line is the K–Pg boundary. **b-e**, Comparison of age estimates between the main analysis and alternative analyses. Red arrows indicate the amount of displacement in the date estimates from the main analysis compared with each alternative analysis. For a description of each analysis, refer to the Methods.

**Extended Data Fig. 7. Exploration of difficult nodes.**
a, Removing species one by one from Columbea and Otidimorphae (rows, heatmap) changed the support for Columbea in the gene trees as measured by the difference between the quartet score of the tree placing Columbea or Mirandornithes at the base. Columbea was not recovered unless all but one Columbiformes or Cuculiformes was removed. Large differences between mean (blue; n = 63,430; shown with s.e.m.) and median (green) show the impact of outlier genes: While the mean score (akin to what is used by ASTRAL) favored Columbea in some cases, the median never favored it. b, Genome-wide scan for the competing topologies for Phaethontimorphae. The main (blue) and the alternative (brown) topology had a normalized quartet score difference of 0.000537%. Chromosomes with <100 windows were excluded. The y axis shows the quartet support for a bipartition in each gene tree minus the mean support for that topology across all gene trees, calculated as a moving average over 100 loci. If a genomic region was strongly in favor of either topology, the two lines would be diverging, but this was not observed. c, The two competing positions (colors as in b) for Phaethontimorphae were responsive to selecting subsets of the intergenic regions that targeted long branches (panels with gray background). Species trees were generated from gene trees split into four quartiles according to their values for seven metrics. For each resulting species tree, the position of Phaethontimorphae is shown (posterior probability=1 throughout). d, Comparison of root-to-tip distances across 21,154,875 gene tree tips as an indicator of susceptibility to long-branch attraction. The violin plots show distributions grouped by orders as well as mean (dots) and three quartiles (horizontal lines). e, Comparison of GC content outliers across birds. For each species grouped by orders, the number of loci that were outliers (defined using the interquartile range) in their GC s.d. from the remaining taxa is shown. The outliers were counted across 159,205 loci from all data types. Rheiformes and Tinamiformes had many loci with a different GC content compared to the remaining birds, which may artificially attract these two taxa. f, Effect of taxon sampling on topology. We sampled 1–10 taxa for each order and investigated the effect on specific nodes, given as the most recent common ancestor (MRCA) of two taxa. Colors indicate the number of replicates that recovered the clade. Most clades were supported irrespective of the number of taxa sampled (yellow), while Columbaves (Mesitornithiformes, Cuculiformes) was only found across all replicates when at least 3 taxa were sampled per order. The MRCA of Phaethontiformes + Strisores was only found when at least 10 taxa were sampled. Strigiformes and Accipitriformes were only recovered as a clade when more than 10 taxa were sampled (discussed in the main text). g, GC-content similarities between Tinamiformes and Rheiformes cause topological changes in gene trees. Positive values of the relative GC similarity indicate that Tinamiformes and Rheiformes are similar to each other but not to Apterygiformes and Casuariiformes, and negative values indicate the opposite. Using this quantity, we divided loci into bins and calculated the quartet score for each bin.

**Extended Data Fig. 8. Comparisons between different data types.**
Colors are the same for each data type across panels. In panels **a–c**, 50 subsets were drawn and summarized into species trees for each data type and each subset of n loci. Boxplot components are the same as in c. a, Greater dataset size resulted in increased similarity to the main tree across all data types. b, Greater dataset size resulted in an increased proportion of highly supported nodes of the resulting species tree across all data types. c, Response to increasing dataset size in comparison to different reference species trees. Each panel compares the same subsets of the 63,430 dataset to the reference trees (obtained from summarizing all loci of a data type), showing that increasing gene tree sampling consistently improved similarity. The increase in similarity to the species tree from concatenation and from analyzing exons is less pronounced, indicating more sustained differences despite large numbers of loci. **d-f**, Density distribution of phylogenetic signal measured as d, the percentage of branches in each gene tree with more than 95% posterior probability support, e, the number of parsimony informative sites (PIS) in a locus, f, the predicted difficulty of each alignment using Pythia. Exons have the lowest signal and are more difficult. UCEs are longer than intergenic regions and thus have more PIS and slightly higher support on average, while the predicted difficulty of estimating trees for both is similar. Introns are heterogenous, ranging from easy to difficult. g, For each data type, loci were sorted according to their magnitude in seven metrics and split into four quantiles. The gene trees of each quantile were summarized into a species tree and compared to the main tree. Exons generally responded the strongest to subsetting, while effects were less pronounced but present in the other data types.

**Extended Data Fig. 9. The number of potential sister groups decreases with increasing number of loci.**
Only those nodes that still had multiple sister group proposals at 8,000 loci are shown. Points show the number of different sister group proposals obtained across 50 subsets of n loci. Shading of the nodes and orange numbers indicate the proportion with which the main topology was obtained.

**Extended Data Fig. 10. Comparison of different chromosomes and chromosomal categories.**
a, Discordance across chromosomes. Mean ± s.e.m. of percent normalized Robinson-Foulds (RF) distance for gene trees from the 80,047 locus set derived from individual chromosomes (circles, left y-axis) and absolute RF distance to species trees (diamonds, right y-axis). Dashed line: mean gene tree distance across all chromosomes. Chromosomes with less than 1000 gene trees were not used to construct species trees. b, Mean ± s.e.m. of the GC s.d. of gene trees from the 80,047 locus set for each chromosome, showing a general increase in GC s.d. in shorter chromosomes. Dashed line: mean across all chromosomes. c, Density plot for distribution of GC s.d. for alignments, showing higher deviation for microchromosomes. d, Pearson correlation of mean normalized RF distance and recombination rate for loci of different chromosome types binned over 500 kb. No adjustments for multiple comparisons were made.

**Extended Data Fig. 11. Trait evolution.**
a, Simulations on inferred Pagel’s lambda (λ) values. To simulate topological error (left), continuous traits were simulated and an increasing proportion of species were randomly misplaced in the phylogeny (n = 100). To simulate the effect of convergence in trait values (right), continuous traits were simulated on a phylogeny and an increasing proportion of species pairs were randomly given the same trait value to simulate the action of convergence (n = 100). Compared to the effects of topological inaccuracies, the influence of convergently similar trait values on λ estimates was weaker. b, Reconstruction of rate changes in body mass evolution (log-transformed). Branches are colored by estimates of the mean rate (log-transformed); rate changes can occur in both directions, either an increase or a decrease. c, Reconstruction of rate changes in relative brain size evolution (residual). Branch colors as in b. Taxa with pronounced rate changes as mentioned in the main text are annotated. d, Model comparisons between variable-rate and single-process models (BM: Brownian motion, EB: early burst, OU: Ornstein–Uhlenbeck) for body size. e, Model comparisons as in d for relative brain size. f, Impact of taxon sampling on ancestral reconstruction of body size. The solid purple line is the result of the ancestral reconstruction of the full dataset. The gray lines are ancestral reconstructions from analyses in which each species’ trait values were randomly drawn from the range of values across their family (n = 100). The chosen values did not impact the reconstructions at deep timescales but estimates diverged more from 25 million years ago to the present, indicating that increased taxon sampling within families may lead to a different trajectory in more recent times. g, Impact of imputation on ancestral reconstructions of relative brain size. The non-imputed dataset contained only values based on the literature, while the imputed dataset included some values inferred using phylogenetic information. Solid lines indicate mean values and ribbons mark 95% confidence intervals. The two ancestral reconstructions are almost indistinguishable.

See this image and copyright information in PMC

References

1. Jarvis ED, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346:1320–1331. doi: 10.1126/science.1253451. - DOI - PMC - PubMed
1. Prum RO, et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature. 2015;526:569–573. doi: 10.1038/nature15697. - DOI - PubMed
1. Kuhl H, et al. An unbiased molecular approach using 3’-UTRs resolves the avian family-level Tree of Life. Mol. Biol. Evol. 2021;38:108–127. doi: 10.1093/molbev/msaa191. - DOI - PMC - PubMed
1. Feng S, et al. Dense sampling of bird diversity increases power of comparative genomics. Nature. 2020;587:252–257. doi: 10.1038/s41586-020-2873-9. - DOI - PMC - PubMed
1. Hinchliff CE, et al. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc. Natl Acad. Sci. USA. 2015;112:12764–12769. doi: 10.1073/pnas.1423041112. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Complexity of avian evolution revealed by family-level genomes

Affiliations

Complexity of avian evolution revealed by family-level genomes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous