Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 9;12(1):3498.
doi: 10.1038/s41467-021-23665-0.

Whole-genome microsynteny-based phylogeny of angiosperms

Affiliations

Whole-genome microsynteny-based phylogeny of angiosperms

Tao Zhao et al. Nat Commun. .

Abstract

Plant genomes vary greatly in size, organization, and architecture. Such structural differences may be highly relevant for inference of genome evolution dynamics and phylogeny. Indeed, microsynteny-the conservation of local gene content and order-is recognized as a valuable source of phylogenetic information, but its use for the inference of large phylogenies has been limited. Here, by combining synteny network analysis, matrix representation, and maximum likelihood phylogenetic inference, we provide a way to reconstruct phylogenies based on microsynteny information. Both simulations and use of empirical data sets show our method to be accurate, consistent, and widely applicable. As an example, we focus on the analysis of a large-scale whole-genome data set for angiosperms, including more than 120 available high-quality genomes, representing more than 50 different plant families and 30 orders. Our 'microsynteny-based' tree is largely congruent with phylogenies proposed based on more traditional sequence alignment-based methods and current phylogenetic classifications but differs for some long-contested and controversial relationships. For instance, our synteny-based tree finds Vitales as early diverging eudicots, Saxifragales within superasterids, and magnoliids as sister to monocots. We discuss how synteny-based phylogenetic inference can complement traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Whole-genome microsynteny-based species tree inference.
a Whole-genome data sets with all predicted genes are used for phylogeny reconstruction. b The synteny network approach first conducts all pairwise reciprocal genome comparisons, followed by synteny block detection. Syntenic anchor pairs from all syntenic blocks constitute the synteny network database (see Methods for details). c We analyzed all synteny clusters after clustering the entire network database. Synteny clusters vary in size and node compositions. Shared genomic rearrangements are reflected by cluster compositions. Specific anchor pairs shared by a lineage/species form specific clusters (e.g. Clusters 4–6). We account for the presence or absence of the same recurring anchors for multiple blocks derived from whole-genome or segmental duplications (e.g. for Species 2 and 3 in Clusters 2, 3, 5, and 8). d The phylogenomic profiling of all clusters constructs a binary matrix, where rows represent species and columns represent clusters. The synteny matrix comprehensively represents phylogenomic gene order dynamics. It transforms the concept of synteny comparisons from analyzing massive parallel coordinates plots into analyzing profiles of individual clusters/networks. Each cluster stands for a shared homologous ‘context’. For example, TE activity can cause genes to be transposed as insertions into new contexts or be lost from the original context (e.g. genes in Clusters 4–6). As long as such transpositions are shared by different genomes (e.g. genes in Clusters 4 and 6) or within the same genome because of whole-genome duplication (e.g. genes in Cluster 5), specific clusters will emerge and corresponding signals will be added to the matrix. This synteny matrix is used as the input for species tree inference by maximum likelihood (referred to as Syn-MRL).
Fig. 2
Fig. 2. Results of Syn-MRL on simulations.
a Simulation process. An evolutionary scenario for a single gene family is illustrated, involving gene duplication, loss, and rearrangement. Extant gene names are labeled at the bottom (left panel). Nodes and edges of synteny clusters are simulated in stages under the listed parameters, ‘snapshots’ of the evolution of the associated gene family synteny network are shown at three-time points right after a speciation event (right panel). For example, at t2, three connected components can be formed: a ‘conserved’ four-node cluster, and two ‘specific’ two-node clusters, which were formed as a result of the gene duplication or rearrangement events that happened between t1 and t2. b Proportions of inferred trees with a particular RF distance to the true 15 monocot species tree. Simulations were conducted for different numbers of total gene families, with 1000 replicates for each simulation. c Distribution of RF distances of the inferred trees to the true 62 plant species tree. A total of 1000 simulation replicates of 1000 gene families were conducted. The 10 most frequently recovered topologies are labeled within the corresponding bar of the histogram.
Fig. 3
Fig. 3. Results of Syn-MRL on empirical data sets.
a Distribution of the support values of all the nodes of the inferred trees for the four data sets under different settings of Amin (the number of required anchor pairs for regions to be called syntenic). The boxplots indicate the minimum, maximum, median (the middle hinge with red dots), first quartile (the lower hinge), and third quartile (the upper hinge) in the data sets (n = 16, 9, 10, and 18 for the data sets of YGOB, Drosophila, vertebrates, and yeast). The whiskers represent the 1.5 inter-quartile range (IQR) extending from the hinges and outliers are shown as individual points. be Pairwise RF distances of the inferred trees, sequence alignment-based tree, and the reported tree of the data sets of YGOB, Drosophila, vertebrates, and yeast, respectively. Note that for the yeast data set, the species ‘Yarrowia lipolytica’ (used as the root of the tree) was absent from the data matrices at A11 and A13 (no synteny detected), and we therefore manually added this species (as the root) to the inferred tree, in order to calculate the RF distances. fi Tree comparisons between the reported tree and the Syn-MRL tree of YGOB, Drosophila, vertebrates, and yeast data sets, at parameter setting values of A5G25, A5G25, A5G6, and A2G25, respectively. Matrices of syntenic percentages of pairwise genome comparisons are aligned to the corresponding species. Each cell of the matrix represents an overall syntenic percentage of a genome comparison, which is calculated using the total number of syntenic genes relative to the total number of genes between two genomes. The color indicates the values and goes from low (blue) to high (red). Different branching patterns are highlighted in red, genome duplication events are labeled as blue dots. Branch lengths are not meaningful. Support values for certain nodes of the vertebrate and yeast trees are from Drillon et al..
Fig. 4
Fig. 4. Maximum likelihood (ML) tree for 123 fully sequenced flowering plant genomes based on the microsynteny approach.
The tree is rooted by Amborella, and four main clades, i.e. superrosids, superasterids, monocots, and magnoliids are shaded in light-red, light-purple, light-green, and light-yellow, respectively. Ultrafast bootstrapping values are denoted for all the nodes. Names for the different plant orders follow the APG IV classification.
Fig. 5
Fig. 5. Comparison of gene clustering based on orthology and synteny of the ABC dataset (see text for details).
a Comparison of the sizes of orthogroups and synteny clusters. b (Upper) binary phylogenomic profiles for all orthogroups (size > 1) (clustered into three groups separated by dashed lines) and (bottom) corresponding synteny profiles (clustered by Jaccard distances within each group). We use the number of involved species to annotate profiles, note that one orthogroup can correspond to multiple synteny clusters, and vice versa. c (Upper) binary phylogenomic profilings of all synteny clusters (clustered into four groups separated by dash lines) and (bottom) corresponding orthogroup profiles (clustered by Jaccard distances within each group).
Fig. 6
Fig. 6. Magnoliids-associated signals and a representative example of phylogenetically informative microsynteny.
a Hierarchical clustering (method: ward.D) of 15,424 magnoliids-associate cluster profiles based on Jaccard distance. On the far-left, the synteny-based species tree is displayed (same as Fig. 4). Superrosids, superasterids, early diverging eudicots, monocots, and magnoliids are shaded in light-red, light-purple, light-grey, light-green, and light-yellow, respectively. 1107 clusters supporting a grouping of magnoliids and monocots (Supplementary Data 5). b One example from all supporting signals. A fifteen-gene context in the genome of Cinnamomum kanehirae (a magnoliid) shows eight neighboring genes (highlighted in orange) only present in magnoliids and monocot genomes, while the flanking genes (colored blue) are generally conserved angiosperm-wide.

References

    1. Van de Peer Y. Computational approaches to unveiling ancient genome duplications. Nat. Rev. Genet. 2004;5:752–763. doi: 10.1038/nrg1449. - DOI - PubMed
    1. Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. doi: 10.1038/nature01521. - DOI - PubMed
    1. Pevzner P, Tesler G. Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Res. 2003;13:37–45. doi: 10.1101/gr.757503. - DOI - PMC - PubMed
    1. Dewey CN. Positional orthology: putting genomic evolutionary relationships into context. Brief. Bioinform. 2011;12:401–412. doi: 10.1093/bib/bbr040. - DOI - PMC - PubMed
    1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev. Genet. 2005;39:309–338. doi: 10.1146/annurev.genet.39.073003.114725. - DOI - PubMed

Publication types