Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 5;116(6):2165-2174.
doi: 10.1073/pnas.1801757116. Epub 2019 Jan 23.

Network-based microsynteny analysis identifies major differences and genomic outliers in mammalian and angiosperm genomes

Affiliations

Network-based microsynteny analysis identifies major differences and genomic outliers in mammalian and angiosperm genomes

Tao Zhao et al. Proc Natl Acad Sci U S A. .

Abstract

A comprehensive analysis of relative gene order, or microsynteny, can provide valuable information for understanding the evolutionary history of genes and genomes, and ultimately traits and species, across broad phylogenetic groups and divergence times. We have used our network-based phylogenomic synteny analysis pipeline to first analyze the overall patterns and major differences between 87 mammalian and 107 angiosperm genomes. These two important groups have both evolved and radiated over the last ∼170 MYR. Secondly, we identified the genomic outliers or "rebel genes" within each clade. We theorize that rebel genes potentially have influenced trait and lineage evolution. Microsynteny networks use genes as nodes and syntenic relationships between genes as edges. Networks were decomposed into clusters using the Infomap algorithm, followed by phylogenomic copy-number profiling of each cluster. The differences in syntenic properties of all annotated gene families, including BUSCO genes, between the two clades are striking: most genes are single copy and syntenic across mammalian genomes, whereas most genes are multicopy and/or have lineage-specific distributions for angiosperms. We propose microsynteny scores as an alternative and complementary metric to BUSCO for assessing genome assemblies. We further found that the rebel genes are different between the two groups: lineage-specific gene transpositions are unusual in mammals, whereas single-copy highly syntenic genes are rare for flowering plants. We illustrate several examples of mammalian transpositions, such as brain-development genes in primates, and syntenic conservation across angiosperms, such as single-copy genes related to photosynthesis. Future experimental work can test if these are indeed rebels with a cause.

Keywords: angiosperms; genome evolution; mammals; phylogenomic synteny profiling; synteny networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Principles and applications of network-based microsynteny analysis. (A) For the genomes of n species, n2 pairwise reciprocal all-vs.-all comparisons of all annotated genes are performed. Gene similarity relationships and relative gene positions are then used for collinearity/microsynteny block detection for each comparison (i.e., at least five syntenic anchor genes in a window of 20 genes). Syntenic anchor pairs were illustrated as colored boxes, black empty boxes represent nonsyntenic genes. All inter- and intraspecies blocks are extracted. Related blocks centered on a target locus (microsynteny block families) are traditionally organized into parallel coordinate plots. (B) Alternatively, we connect syntenic genes into clusters where nodes are genes and edges between the nodes means “syntenic”; cluster sizes depend on the number of related microsynteny blocks. (C) Network metrics and tools can then be utilized for a number of novel applications. For example, assessing overall genome quality that can be complementary to BUSCO. Principles of genome and gene family evolutionary dynamics across species can be inferred from network parameters such as clustering coefficients. Microsynteny network of multigene families can be decomposed using clustering algorithms. The clusters can then be analyzed by phylogenetic context (phylogenomic synteny profiling) to analyze gene copy number, long-term synteny conservation, and detection of lineage-specific changes in a syntenic context (i.e., gene transpositions).
Fig. 2.
Fig. 2.
Phylogenetic relationships of mammalian and angiosperm genomes analyzed. (A) Mammal genomes used (tree in red), highlighting the three main placental clades of Laurasiatherias (light-gray shading), Euarchontoglires (light-orange shading), and Afrotheria (light-blue shading). (B) Angiosperm genomes used (tree in blue), highlighting the three main clades of rosids (light-red shading), superasterids (light-purple shading), and monocots (light-green shading). The tree and clade shading is maintained in the latter figures. Mammal images courtesy of Tracey Saxby, Diana Kleine, Kim Kraeer, Lucy Van Essen-Fishman, Kate Moore, and Dieter Tracey, Integration and Application Network, University of Maryland Center for Environmental Science (ian.umces.edu/imagelibrary/).
Fig. 3.
Fig. 3.
Pairwise collinearity/microsynteny comparisons of mammalian and angiosperm genomes. (A) Pairwise microsynteny comparisons across mammal genomes. (B) Pairwise microsynteny comparisons across angiosperm genomes. The color scale indicates the syntenic percentage. Species are arranged according to the consensus phylogeny (Fig. 2). Overall, average microsynteny is much higher across mammals than plants. Also, the detected syntenic percentage does not show a strong phylogenetic signal. For example, contrasts are not higher for intra-Chiroptera (bats) or intra-Bovidae (cattle) than for distant pairwise contrasts. However, it is slightly higher for intraprimate contrasts, whereas, there is a much stronger phylogenetic signal seen for plant genomes such as intra-Brassicaceae or intra-Poaceae (grasses) contrasts than for interfamilial contrasts. The method also allows for easy detection of low-quality genomes. The diagonal for both plots represents intragenome comparisons which can detect potential recent and ancient WGDs. Note, that almost all plant genomes have higher intragenome microsyntenic pair scores than all mammal intragenome comparisons.
Fig. 4.
Fig. 4.
Network statistics for mammal (red) and angiosperm (blue) microsynteny networks. (A) Number of total nodes, edges, and clusters. Note, compared with mammals, flowering plants have ∼1.5 times total nodes, fewer (0.94) total edges, and ∼4.5 times total number of clusters. Mammal mean node degree and clustering coefficient are significantly higher than that for flowering plants (***P < 2.2e-16). (B) Node degree distribution and corresponding cumulative percentage. The majority of mammal nodes peak around the degree 70–80. The scales of the axes are logarithmic. (C) Cluster size distribution by Infomap algorithm. Microsynteny cluster sizes vary from two to several thousand. (D) Corresponding clustering coefficient (median) and number of species (median) under certain sizes.
Fig. 5.
Fig. 5.
Phylogenomic synteny profiling of mammal and angiosperm genomes. (A) Phylogenomic synteny profiling (copy-number profiling of microsynteny clusters across a phylogeny) of all mammalian clusters (size ≥ 3). Groups of lineage-specific clusters are boxed and labeled. (B) Phylogenomic synteny profiling of all angiosperm clusters (size ≥ 3). Groups of lineage-specific clusters are boxed and labeled. Black arrows mark nearly empty rows which indicate a poor genome quality. Overall, mammals have mostly syntenic (conserved) and single-copy genes, whereas angiosperms have many multicopy and/or lineage-specific microsynteny clusters.
Fig. 6.
Fig. 6.
Overall microsynteny conservation and examples of mammal and plant BUSCO genes. (A) Bar plot shows overall percentage of mammal BUSCOs that belong to certain number of synteny clusters. Most mammal BUSCO genes belong to only one synteny cluster. Examples of mammal BUSCO families which have two clusters are highlighted, including the oncogenes BRCA2 and TRRAPP (Chiroptera specific), MPHOSPH10 and CENPJ are associated with cell-divisions and possibly brain development (primate specific), and the peptide hormone angiotensin AGT and MRPL19 (Bovidae specific). (B) Bar plot shows overall percentage of plant BUSCOs that have certain number of synteny clusters. Examples are highlighted of BUSCO gene families that belong to one synteny cluster, which are involved in hormone signaling (CCD7 and SNX1) and photosynthesis (VTE1, CHLG, ObgC, and PNSL4). Node colors indicate lineages which are consistent with Fig. 3. Nodes for Vitis vinifera (basal rosids), Nelumbo (basal eudicots), and Amborella (basal angiosperm) are labeled red. Node labels are letter-coded species names which can be found in Dataset S1.

Similar articles

Cited by

References

    1. O’Brien SJ, et al. The promise of comparative genomics in mammals. Science. 1999;286:458–46. 2, 479–481. - PubMed
    1. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–4288. - PMC - PubMed
    1. Domazet-Lošo T, Tautz D. Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biol. 2010;8:66. - PMC - PubMed
    1. Sharma V, et al. A genomics approach reveals insights into the importance of gene losses for mammalian adaptations. Nat Commun. 2018;9:1215. - PMC - PubMed
    1. Delaux P-M, et al. Comparative phylogenomics uncovers the impact of symbiotic associations on host genome evolution. PLoS Genet. 2014;10:e1004487. - PMC - PubMed

Publication types

LinkOut - more resources