Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 1;41(3):msae036.
doi: 10.1093/molbev/msae036.

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography

Affiliations

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography

Yury V Bukhman et al. Mol Biol Evol. .

Abstract

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.

Keywords: animal genomes; body size; cetaceans; conservation; developmental biology; evolution; genetic diversity; segmental duplications.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1.
Fig. 1.
Assembly quality metrics. Blue whale (Balaenoptera musculus) data are shown in red; the 2 other VGP assemblies, vaquita (Phocoena sinus) and bottlenose dolphin (Tursiops truncatus), are in blue. a) Assembly contig and scaffold N50 metrics. Contigs are segments of contiguous, i.e. gapless sequence. Scaffolds are sets of contigs that have been ordered and oriented using long-range mapping data such as optical maps and Hi-C with gaps between contigs. N50 is a measure of average length, e.g. 50% of all bases are contained in contigs of length N50 or longer. b) % of complete and fragmented universal single copy BUSCO orthologs found in an annotated genome. Universal single copy orthologs are genes that are present in a single copy in all or most genomes within a phylogenetic group. A high % complete score is an indication that a genome assembly is not missing a large amount of gene-coding sequence (Simão et al. 2015; Manni et al. 2021). C) TOGA status of 18,430 ancestral placental mammal genes. Note: For 2 species, different assemblies were used in panel C compared to panel A: GCA_004363415.1 instead of GCA_002189225.1 for Eschrichtius robustus and GCA_008795845.1 instead of GCA_023338255.1 for Balaenoptera physalus.
Fig. 2.
Fig. 2.
Blue whale genome size. a) Genome size estimation by flow cytometry. CRBC were used as the standard. b) K-mer spectra plot generated by the Merqury software (Rhie et al. 2020).
Fig. 3.
Fig. 3.
Gene duplications in blue whale, vaquita, dolphin, and cattle. Duplications that are resolved in the assembly are stratified by duplication identity. The identity of collapsed duplications may not be directly discovered from read alignments.
Fig. 4.
Fig. 4.
Examples of duplicated genes. a to c) Sequencing read coverage plots of the collapsed duplications containing KCNMB1, FZD5, and MT1X genes. Average coverage is shown in panels (a) to (c) in the dashed red line. MT1X duplication is partially resolved, as evidenced by the four resolved copies of the gene, shown as boxes. d) Genomic region containing XRCC1 in blue whale and vaquita. XRCC1 genes are highlighted in red and labeled by the gene name. The second XRCC1 locus in the blue whale is labeled by its locus number, LOC118885654. This locus also has an increased read coverage, suggesting an unresolved third copy; see supplementary fig. S5, Supplementary Material online.
Fig. 5.
Fig. 5.
IGF1 sites potentially associated with body size. a) Dog site rs22397284 in the context of a multiple alignment with cetaceans and the human genome. The sequences here are reverse complements of those shown in (Plassais et al. 2022), Fig. 3. rs22397284 is marked by an arrow. b) Phylogenetic tree of 11 cetaceans considered in this analysis, generated by Timetree. The 3 clades discussed in the text and the Orca are shown in different colors. c) An example type 1 site, blue whale chromosome 10 position 85,169,891. d) An example type 2 site, blue whale chromosome 10 position 85,160,822. See Table 3 for site types.
Fig. 6.
Fig. 6.
Historical demography of Pacific and Atlantic blue whales from PSMC analysis of genomes. The pseudodiploid plot represents coalescence between the 2 genomes, where the rapid increase starting approximately 125 kyr ago indicates cessation of gene flow (coalescence) between the populations. Generation time = 30.8 yr; Autosomal mutation rate (µA) = 1.58E-08 substitutions/bp/generation.
Fig. 7.
Fig. 7.
Distributions of heterozygosity across the genomes of the North Pacific and North Atlantic blue whales. (left) Barplot shows per-site heterozygosity in nonoverlapping 1-Mb windows across 22 scaffolds >10 Mb in length. Scaffolds are shown in alternating shades. (right) Histogram of the count of per-window heterozygosity levels.
Fig. 8.
Fig. 8.
Comparison of inbreeding factors (FROH) based on the genome coverage of ROH between (a) the long-read assembly and (b) the linked-short-read assembly. ROH were identified with DARWINDOW, using a sliding-window-based approach, and sorted into respective length-bins. In both assemblies, 108 ROH over 500 kb were found; however, they appear to be more continuous in the long-read assembly as indicated by the longest ROH located on Super-Scaffold 4 (chromosome 8) of the respective assembly. A visual representation of this ROH is given in (c) and depicts the heterozygosity distribution of 20 kb windows over the scaffold in blue while identified ROH were marked as gray bars.

Similar articles

Cited by

References

    1. Alegretti AP, Bittar CM, Bittencourt R, Piccoli AK, Schneider L, Silla LM, Bó SD, Xavier RM. The expression of CD56 antigen is associated with poor prognosis in patients with acute myeloid leukemia. Rev Bras Hematol Hemoter. 2011:33(3):202–206. 10.5581/1516-8484.20110054. - DOI - PMC - PubMed
    1. Archer FI, Brownell RL Jr, Hancock-Hanser BL, Morin PA, Robertson KM, Sherman KK, Calambokidis J, Urbán R J, Rosel PE, Mizroch SA, et al. . Revision of fin whale Balaenoptera physalus (Linnaeus, 1758) subspecies using genetics. J Mammal. 2019:100(5):1653–1670. 10.1093/jmammal/gyz121. - DOI
    1. Arendt M, Fall T, Lindblad-Toh K, Axelsson E. Amylase activity is associated with AMY2B copy numbers in dog: implications for dog domestication, diet and diabetes. Anim Genet. 2014:45(5):716–722. 10.1111/age.12179. - DOI - PMC - PubMed
    1. Árnason Ú, Lammers F, Kumar V, Nilsson MA, Janke A. Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci Adv. 2018:4(4):eaap9873. 10.1126/sciadv.aap9873. - DOI - PMC - PubMed
    1. Atz ME, Rollins B, Vawter MP. NCAM1 association study of bipolar disorder and schizophrenia: polymorphisms and alternatively spliced isoforms lead to similarities and differences. Psychiatr Genet. 2007:17(2):55–67. 10.1097/YPG.0b013e328012d850. - DOI - PMC - PubMed