Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;8(3):519-535.
doi: 10.1038/s41559-023-02299-z. Epub 2024 Jan 12.

Hagfish genome elucidates vertebrate whole-genome duplication events and their evolutionary consequences

Affiliations

Hagfish genome elucidates vertebrate whole-genome duplication events and their evolutionary consequences

Daqi Yu et al. Nat Ecol Evol. 2024 Mar.

Abstract

Polyploidy or whole-genome duplication (WGD) is a major event that drastically reshapes genome architecture and is often assumed to be causally associated with organismal innovations and radiations. The 2R hypothesis suggests that two WGD events (1R and 2R) occurred during early vertebrate evolution. However, the timing of the 2R event relative to the divergence of gnathostomes (jawed vertebrates) and cyclostomes (jawless hagfishes and lampreys) is unresolved and whether these WGD events underlie vertebrate phenotypic diversification remains elusive. Here we present the genome of the inshore hagfish, Eptatretus burgeri. Through comparative analysis with lamprey and gnathostome genomes, we reconstruct the early events in cyclostome genome evolution, leveraging insights into the ancestral vertebrate genome. Genome-wide synteny and phylogenetic analyses support a scenario in which 1R occurred in the vertebrate stem-lineage during the early Cambrian, and 2R occurred in the gnathostome stem-lineage, maximally in the late Cambrian-earliest Ordovician, after its divergence from cyclostomes. We find that the genome of stem-cyclostomes experienced an additional independent genome triplication. Functional genomic and morphospace analyses demonstrate that WGD events generally contribute to developmental evolution with similar changes in the regulatory genome of both vertebrate groups. However, appreciable morphological diversification occurred only in the gnathostome but not in the cyclostome lineage, calling into question the general expectation that WGDs lead to leaps of bodyplan complexity.

PubMed Disclaimer

Conflict of interest statement

M.D.C. is a cofounder and shareholder of NovAb, Inc., which produces lamprey antibodies for biomedical purposes. J.P.R. is a consultant for NovAb. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genome of the inshore hagfish, E. burgeri.
a, Dorsal view of a young adult of the inshore hagfish E. burgeri, with the head to the top right. The teeth apparatus (and not a jaw) can be observed in a magnification of the head region of a fixed adult individual (a’). b, Fertilized egg of E. burgeri with a developing embryo at stage Dean 53 (ref. ). Blood vessels can be observed from the exterior. c, Two competing hypotheses of vertebrate phylogeny. WGD events corresponding to the 2R hypothesis (lilac), to an alternative vertebrate 2R hypothesis (orange) and to those recently proposed in the lamprey lineage (light blue) are marked. Whether the lamprey-specific events actually occurred in a stem cyclostome remains elusive. d, Hi-C contact heatmap of the corrected hagfish genome assembly ordered by cluster (chromosome) length. Dashed boxes indicate the cluster boundaries. e, Completeness assessment of the genome assembly of the inshore hagfish E. burgeri genome (red), three lamprey species (blue) and two jawed vertebrates (green). Number of conserved metazoan orthologues (metazoa_odb10 dataset, containing 954 BUSCOs) is indicated for each case. F. E., Far Eastern. f, Correspondence analysis (CoA) on RSCU values was performed using the nucleotide sequences of all predicted genes concatenated for individual species. The percentage of variance is indicated for each axis. g, CoA of amino acid composition, with the percentage of variance indicated for each axis. In f and g: red, hagfish; blue, lamprey; green, jawed vertebrates; black, invertebrates.
Fig. 2
Fig. 2. Calibrated and dated vertebrate evolution.
Time-calibrated rooted phylogeny of vertebrates and two non-vertebrate species with 95% credibility intervals for clade divergence times indicated by red bars on nodes. The phylogenetic tree was obtained with Bayesian inference (Extended Data Fig. 2a) and all nodes were recovered with a posterior probability of 1. Numbers of gene family gains (green, novel homology group (HG); blue, novel core HG) and losses (red) are indicated in selected nodes (see text). Dated WGD events, including 1R, 2R and cyclostome-specific event (CR) described in this study, are indicated with coloured rectangles. The hagfish position is highlighted with a thickened line and bold font. Geological periods are colour-coded at the bottom: Ediac., Ediacaran; Cambr., Cambrian; Ordo., Ordovician; Sil., Silurian; Devon., Devonian; Carbonif., Carboniferous; Perm., Permian; Palg., Paleogene; Ng., Neogene. Animal illustrations kindly provided by Tamara de Dios Fernández; human, zebrafish, lamprey and hagfish illustrations reproduced with permission from ref. .
Fig. 3
Fig. 3. Reconstruction of the Hox complement of an ancestral cyclostome.
Schematic representations of Hox clusters and syntenic genes of the inshore hagfish (E. burgeri, middle), the sea lamprey (P. marinus, bottom) and a reconstruction of the complement of the last common ancestor of hagfishes and lampreys (top). Genes are represented by colour-coded arrows whose direction marks the sense of transcription: Hox genes in red, non-Hox genes coloured by homology (legend at right). The block between Evx-ζ and Creb2/5/7-ζ is assembled downstream of the Hox-ζ cluster, separated by 7 genes. This might be a misassembly, and their ‘natural’ upstream position is marked by a dashed line. The sea lamprey genome scaffolds and hagfish Hi-C clusters in which Hox clusters are located are indicated to the right of each cluster, with cluster 3 separated into 3L (0–107.78 Mb) and 3R (107.78–194 Mb). Black asterisks mark genes placed at opposite sides of a cluster in the hagfish and the lamprey, but placed at one side of the ancestrally reconstructed cluster on the basis of comparisons with gnathostomes (Supplementary Fig. 9); hashes denote genes present in lampreys in the same chromosome but at a long distance (see Supplementary Fig. 8); white asterisk denotes Nfe2-ζ inferred due to its presence in the Arctic lamprey. Animal illustrations kindly provided by Tamara de Dios Fernández; lamprey and hagfish illustrations reproduced with permission from ref. .
Fig. 4
Fig. 4. Hagfish and lamprey share a whole-genome triplication.
a, Phylogenetic support of gnathostome and cyclostome genes for 1R and 2R. Elephant shark, hagfish, lamprey or both cyclostomes’ genes (both hagfish and lamprey genes included) were analysed as test genes in the context of spotted gar and chicken gene phylogenies by each AC (using amphioxus genes) and orthologous sea cucumber genes (outgroup). Left: possible positions where test genes can branch, supporting or not 1R or/and 2R (see legend). Middle and right: statistics of supporting (blue) or not supporting (orange) gene phylogenies from each species’ tested genes. All phylogenetic trees are available in Supplementary Files 5–8. b, Formula to calculate the OR between two chromosomes. Dark cyan denotes genes from the AC, retained in modern chromosomes; white indicates gene loss. c, OR values distribution between WGD-generated paralogous (ohnologous) chromosomes in chicken (top left) and spotted gar (top right), and the artificially split chromosomes in chicken (bottom left) and spotted gar (bottom right). Dashed lines mark OR = 0.15. d, OR values distribution between putative ohnologous chromosomes in hagfish (top left) and lamprey (top right), and the artificially split chromosomes in hagfish (bottom left) and lamprey (bottom right). e, OR values distribution between chicken and spotted gar (top) and between hagfish and lamprey (bottom) orthologous chromosomes. f, Numbers of mutually ohnologous chromosomes in cyclostome genomes that correspond to each one of the 16 reconstructed ACs. g, Retention profile clustering analysis of cyclostome chromosomes deriving from AC2. Retained genes are denoted by dark cyan lines. Five putative orthologous chromosome pairs are defined. Note that AC17 was excluded from the analyses depicted in cf because of the low number of genes we recovered (20 genes). Animal illustrations kindly provided by Tamara de Dios Fernández; chicken, spotted gar, lamprey and hagfish illustrations reproduced with permission from ref. .
Fig. 5
Fig. 5. Impact of WGD events on the regulatory genome.
a, Distributions of the ACR numbers within the cis-regulatory regions of each gene (see Methods). n = 28,497 (amphioxus), n = 23,183 (zebrafish), n = 22,184 (medaka), n = 15,213 (chicken), n = 23,256 (mouse) and n = 16,951 (hagfish) genes. ***P < 2.2 × 10−16, Bonferroni-adjusted, two-sided Wilcoxon rank-sum tests. b, Numbers and fractions of ACRs with respect to genomic annotations in each species. Promoters, between 1 kb upstream and 0.5 kb downstream of annotated transcription start sites (TSSs); proximal, within 5 kb upstream and 1 kb downstream of annotated TSSs, but not overlapping promoters; exonic, within exons of protein-coding genes but not overlapping proximal regions; distal, not in aforementioned locations. c, Cumulative proportion of the distance of ACRs from the closest TSSs in each species. For the result with scaling based on the average length of intergenic regions of each species genome, see Extended Data Fig. 8c. d, The distribution of ACR numbers across different classes of genes, according to PANTHER Gene Ontology database (devel., developmental ohnologues; non-dev., non-developmental ohnologues; non-ohnol., singletons). n = 143 (devel. ohnol.), n = 816 (non-devel. ohnol.) and n = 7,303 (non-ohnol.) genes. P values from Bonferroni-adjusted two-sided Wilcoxon rank-sum tests are indicated. e, Distribution of fates of ohnologous families after WGD. Red., potential redundancy; Subf., potential subfunctionalization; Spec., potential specialization. f, Number of ohnologues with strong specialization expressed in hagfish tissues. In a and d, boxes correspond to the median (centre line) and the first and third quartiles. Whiskers extend to the last point no further than 1.5× the interquartile range from the first and third quartiles. For ad, see Supplementary Tables 52–57 for detailed statistical information, including P value for each pairwise comparison. Animal illustrations kindly provided by Tamara de Dios Fernández; chicken and hagfish illustrations reproduced with permission from ref. .
Fig. 6
Fig. 6. Morphological evolution of vertebrates.
a,b, Morphological disparity across vertebrates. Non-metric ordinations are presented, highlighting the morphological variance among (a) taxonomic lineages of extant and extinct vertebrates and (b) the descendants of 3 whole-genome duplication events. Convex hulls have been fitted around groups. The underlying tree was derived from a consensus of relationships from the literature.
Extended Data Fig. 1
Extended Data Fig. 1. Features of the hagfish genome.
a, 17-mer distribution for inshore hagfish genome size estimation using all raw reads from short insert-size libraries. b, Counts for major classes of genes and transcripts from Ensembl annotation. c, Completeness assessment of the annotation of the inshore hagfish E. burgeri genome (red), three lamprey species (blue) and two jawed vertebrates (green). Numbers of conserved metazoan orthologs (metazoa_odb10 dataset, n = 954) are indicated for each case. F. E., Far Eastern d, GC-content distribution of the hagfish genome and other chordates calculated from 10-kb non-overlapping windows. e, All codon type frequency in given chordate genomes according to GC-content. GC-0/1/2/3 indicates the number of G or C bases in a codon. f, Distribution of GC content at each codon position or at all codon positions (Codon1 + 2 + 3). For each protein coding gene, we only kept the longest coding sequence. For each coding sequence, we calculated the GC content at separate codon positions. We also calculated the GC content for each coding sequence, which is equal to the GC content of all three codon positions. g, Violin plots of size distribution of intron (left), intergenic (middle) and gene body (right) lengths over a logarithmic scale of the hagfish (E. burgeri), two lamprey species (sea lamprey, P. marinus; Arctic lamprey, L. camtschaticum), six gnathostome vertebrates (human, H. sapiens; frog, X. tropicalis; coelacanth, L. chalumnae; chicken, G. gallus; spotted gar, L. oculatus; and the elephant shark, C. milii) and two invertebrate deuterostomes (sea cucumber, A. japonicus; amphioxus, B. belcheri). For each genomic feature and each species, the median (centre line) and IQR (interquartile range) length statistics are indicated with a white rectangle; whiskers extend to the last point no further than 1.5 times the interquartile range from the first and third quartiles Dashed vertical line indicates median size of E. burgeri features; *** P < 2.2 × 10−16, two-sided Wilcoxon rank sum test. Animal illustrations kindly provided by Tamara de Dios Fernández and reproduced with permission from REF. .
Extended Data Fig. 2
Extended Data Fig. 2. Phylogenomic analysis of chordates and gene duplication rates across metazoan evolution.
a, Bayesian Inference tree of 31 chordate species was built using a protein alignment (see Methods). All nodes were recovered with a posterior probability of 1. Cyclostome monophyly was unequivocally supported. Scale bar indicates 0.1 substitutions per site. b, c, The exact number of gene duplication events inferred using OrthoFinder2 with greater than 50% support (b) and the number of events per million years per branch (c) are shown. In each case, the colour of the branch represents the value according to the key on the upper-left of each pane. Hagfish is highlighted in red font in all panels.
Extended Data Fig. 3
Extended Data Fig. 3. Reconstruction of ancestral gene content.
a, Cladogram showing the phylogenetic relationships of 45 species representatives of all major eumetazoan taxa with species of gnathostomes and cyclostomes highlighted in blue and orange, respectively. Gene family gains and losses are indicated in selected nodes: green, novel homology groups (HG); blue, novel core HGs; red, lost HGs. b, Top 14 Protein Class GO hits for novel homology groups (HG) gained across different nodes of chordates, color coded by taxa (legend at the bottom right) and sorted by the Vertebrata node. The largest GO enriched terms are ‘transmembrante signal receptor’ and ‘intercellular signal molecule’ in vertebrates, and ‘defense/immunity protein’ in gnathostomes.
Extended Data Fig. 4
Extended Data Fig. 4. Phylogenetic and retention profile clustering analyses of Hox syntenic regions.
a-d, Bayesian inference phylogenetic trees of amino acid sequences of 4 non-Hox syntenic genes to Hox clusters, Gbx (a), Cbx (b), Hnrnpa (c) and Agap (d), of the inshore hagfish (in red), the sea lamprey (in light blue), the Arctic lamprey (in dark blue) and selected gnathostomes. Orthologs from the European amphioxus Branchiostoma lanceolatum were used as outgroup to root the trees. Posterior probability is indicated in each node. Scales indicate number of substitutions per site. Phylogenetic analyses of Hox genes generally fail to determine orthology due their high conservation and short alignments. The phylogenetic trees of these non-Hox linked genes clearly support the orthology of Hox-α (Gbx, Cbx and Hnrnpa), Hox-δ (Gbx, Hnrnpa and Agap), and Hox-ζ (Hnrnpa) clusters, while β and ε genes always group together, as previously observed for the lamprey. The alignments used to build the trees, together with the MrBayes parameters and number of generations used to build each tree are provided as Supplementary Files 16–19. e, clustering analysis of retention profiles (see main text) resolved the orthology relationships of Hox-β, Hox-ε, Hox-γ, as well as Hox-ζ clusters. Supported orthologies in each analysis are marked with color-coded rectangles. The location of each cluster is indicated in parenthesis in e (ssc, super scaffold; HiC cl, Hi-C contact cluster, or chromosome). For the clustering analysis of AC1-derived chromosomes in the lamprey and hagfish, we split Hi-C cluster 3 into two halves, each containing one Hox cluster: 3L (coordinates 0–107.78 Mb), 3R (107.78-194 Mb).
Extended Data Fig. 5
Extended Data Fig. 5. Emergence of gnathostome karyotypes via 2R.
a, Chromosomal level phylogenetic trees demonstrate the occurrence of 2R WGD. The two rounds of WGD are color-coded. Cyan and red denote genes encoded by corresponding chicken chromosomes and gar LOCs, respectively. Bootstrap support and posterior probability values of three methods are marked on branches: bootstrap value from RAxML-ng/bootstrap value from IQ-Tree/posterior probability from Astral-III. b, Evolution of gnathostome karyotype through 1R and 2R, with post-1R/pre-2R chromosomal fusions indicated with red lines, and ratios of gene retention asymmetries shown for directly ohnologous chromosomes after 1R and 2R. The size of each ancestral chromosome is proportional to the number of retained genes from ACs.
Extended Data Fig. 6
Extended Data Fig. 6. Contributions of vertebrate ACs to the genomes of hagfish and sea lamprey.
a, Sea lamprey super-scaffolds (pseudo-chromosomes) are generally homologous to one single AC except five scaffolds labeled with an arrow. Sea lamprey scaffolds scaf_00001, scaf_00002, scaf_00008, scaf_00010, scaf_00011 and scaf_00023, confounded by missassembly, are not presented here. b, The distribution of the descendant copies of AC genes for the 19 hagfish Hi-C clusters (putative chromosomes). Only significant homologous relationships between hagfish clusters and ACs are shown. Because hagfish cluster 19 is not homologous to any AC, it is presented as a blank block. Genes are color-coded according to its homologous AC.
Extended Data Fig. 7
Extended Data Fig. 7. Overlapping ratio and clustering analysis of ohnologous and orthologous chromosomes.
a, b, AC1 corresponds to five and six mutually paralogous chromosomes in hagfish (a) and sea lamprey (b) genomes, respectively. Numbers in colour-coded cells (bottom left triangle) indicate the OR between two chromosomes. Numbers in white cells (top right triangle) indicate the number of shared retained genes between two chromosomes. Numbers on the diagonal line from top left to bottom right (thick-lined cells) indicate the total number of retained genes of a chromosome. Overlapping rations corresponding to all hagfish chromosomes and lamprey scaffolds are provided in Supplementary File 9. c, Retention profile clustering analysis of gnathostome orthologous chromosomes deriving from AC1. Retained genes are denoted by dark cyan lines. Four orthologous chromosome pairs are defined. d, The clustering found in (c) is the same as that found in the phylogenetic analysis (Extended Data Fig. 5a), demonstrating the reliability of the OR approach.
Extended Data Fig. 8
Extended Data Fig. 8. Fate and cis-regulatory evolution of ohnologs after WGD.
a, b, Gene Ontology enrichment analysis of ohnologs in the hagfish (a) and the chicken (b). Top 20 terms are shown, majority of which are related with development. c, Cumulative distribution of distance of ACRs from the closest TSSs normalized by the average length of intergenic regions in each genome. d, Distribution of the distance from ACRs to closest TSS of developmental ohnologs (Devel.), non-developmental ohnologs (Non-dev.) and non-ohnologous (Non-ohnol.) genes. e, Distribution of the number of ACRs per gene, normalized by the GREAT region length, for developmental ohnologs (Devel.), non-developmental ohnologs (Non-dev.) and non-ohnologous (Non-ohnol.) genes. Sample size for groups in (d) and (e) are identical. n = 143 (Devel.), n = 816 (Non-dev.), n = 7303 (Non-ohnol.). f, Proportion of distal ACRs across different gene functional categories. Within a GREAT-defined region, proximal regulatory sequences were defined as those from 5 kb upstream to 1 kb downstream of a TSS, and the rest of the region was treated as distal. g, Distribution of the number of ACRs of hagfish ohnologs for each category (special., specialization). n = 344 (Redundancy), n = 178 (Subfunction.), n = 334 (Mild special.), n = 240 (Strong special.). h, Distribution of pairwise protein identity for hagfish ohnologous pairs for each category. n = 170 (Redundancy), n = 88 (Subfunction.), n = 165 (Mild special.), n = 120 (Strong special.). i, Distribution of pairwise protein identity for chicken ohnologous pairs for each category. n = 275 (Redundancy), n = 178 (Subfunction.), n = 318 (Mild special.), n = 167 (Strong special.). j, Number of ohnologues with strong specialization in chicken expressed in each tissue. Only the gene in a pair with narrower expression breadth is analyzed. In panels d, e, g-i P values correspond to two-sided Wilcoxon rank sum test between the indicated groups; boxes correspond to the median (centre line) and the first and third quartiles; whiskers extend to the last point no further than 1.5 times the interquartile range from the first and third quartiles. All statistical information for panels d, e, g-i is provided in Supplementary Tables 58–62. Animal illustrations kindly provided by Tamara de Dios Fernández and reproduced with permission from REF. .

References

    1. Van de Peer Y, Maere S, Meyer A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 2009;10:725–732. - PubMed
    1. Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314. - PMC - PubMed
    1. Ohno, S. Evolution by Gene Duplication (Springer, 1970).
    1. Donoghue P, Purnell M. Genome duplication, extinction and vertebrate evolution. Trends Ecol. Evol. 2005;20:312–319. - PubMed
    1. Holland LZ, Ocampo Daza D. A new look at an old question: when did the second whole genome duplication occur in vertebrate evolution? Genome Biol. 2018;19:2–5. - PMC - PubMed