Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 18;25(1):871.
doi: 10.1186/s12864-024-10747-8.

Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)

Affiliations

Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)

Nicholas S Kron et al. BMC Genomics. .

Abstract

Background: The family Batrachoididae are a group of ecologically important teleost fishes with unique life histories, behavior, and physiology that has made them popular model organisms. Batrachoididae remain understudied in the realm of genomics, with only four reference genome assemblies available for the family, with three being highly fragmented and not up to current assembly standards. Among these is the Gulf toadfish, Opsanus beta, a model organism for serotonin physiology which has recently been bred in captivity.

Results: Here we present a new, de novo genome and transcriptome assemblies for the Gulf toadfish using PacBio long read technology. The genome size of the final assembly is 2.1 gigabases, which is among the largest teleost genomes. This new assembly improves significantly upon the currently available reference for Opsanus beta with a final scaffold count of 62, of which 23 are chromosome scale, an N50 of 98,402,768, and a BUSCO completeness score of 97.3%. Annotation with ab initio and transcriptome-based methods generated 41,076 gene models. The genome is highly repetitive, with ~ 70% of the genome composed of simple repeats and transposable elements. Satellite DNA analysis identified potential telomeric and centromeric regions.

Conclusions: This improved assembly represents a valuable resource for future research using this important model organism and to teleost genomics more broadly.

Keywords: Genome assembly; HiFi; Model organism; PacBio; Teleost; Toadfish.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Photo of genetic neotype for Gulf toadfish, Opsanus beta. The adult male, named Bic, was selected from the toadfish stock at the Toadfish Lab at University of Miami Rosenstiel School for Marine, Atmospheric, and Earth Sciences as DNA and RNA sample donor for genomic and transcriptomic assembly
Fig. 2
Fig. 2
Genome parameters estimated from kmer profile of HiFi reads with meryl and genomescope v2 using a calculated ideal kmer size of 21. Bimodal distribution is typical of heterozygous genomes. The smaller and larger peaks representing hetero and homozygous kmers respectively, with larger peak at sequencing depth of coverage (48x). Profile analysis estimates a 2.09 gigabase genome, with 46.5% of sequence being non-repetitive and a heterozygosity rate of 0.9%
Fig. 3
Fig. 3
Circos ideogram of fOpsBet2.1 chromosome-scale genome assembly. A 23 chromosome-scale scaffold (ob1-ob23) lengths with contigs represented as alternating grey and white regions. B Genome GC% content. Y-axis shows 35% (0.35) to 55% (0.55). C Gene density calculated using the GFF3 from funannotate::update. Y-axis ranges from 0% (0) to 100% (1) of bases in sliding window. D Transposable element (TE) density calculated using the GFF3 generated with RepeatModeler + repclassifier. Y-axis ranges from 50% (0.5) to 90% (0.9) of bases in sliding window. E Satellite DNA density calculated using the GFF3 generated by TRASH. Y-axis ranges from 0% (0) to 75% (0.75) of bases in sliding window. F Frequency of TTAGGG telomeric satellite calculated with tidk. Y axis shows the counts for each sliding window, with peaks identifying telomeric repeats. G Frequency of GATA satellite calculated with tidk. Y-axis shows the counts of GATA satellites for each sliding window. All tracks were generated with a window size of 2Mbp. The 39 unplaced scaffolds are not represented
Fig. 4
Fig. 4
Dot plot of pairwise alignment of fThaAma1.1 and fOpsBet2.1 chromosome-scale scaffolds. Pairwise alignments are colored blue from forward, green for reverse, and orange for repetitive alignments. Only alignments 4, kilobases and longer are represented. The two assemblies show a high degree of collinearity. In addition, several O. beta scaffolds contain large inversions, including scaffolds 4, 9, 15, and 17. Observable gaps in the alignments (such as on scaffold 1) consist primarily of repetitive alignments less than 4 kilobases in length
Fig. 5
Fig. 5
Number of identified universal single copy orthologs identified by BUSCO analysis in de novo assembly of Opsanus beta from this study, current reference assembly for O. beta, and chromosome-scale reference assembly of close relative Thalassophryne amazonica. The new O. beta assembly is the most complete Batrachoididae genome assembly currently available, at 97.3% complete (96.1% single copy, 1.3% duplicated). Light blue = Complete and single copy. Dark Blue = Complete and Duplicated. Yellow = Fragmented. Red = Missing. Analysis run using the Actinopterygii_odb10 database
Fig. 6
Fig. 6
Mitochondrial genome assembly of Opsanus beta. Tracks, moving from outermost to innermost, represent: genomic features of the heavy strand and light strand, GC skew, GC content, and sequence length. Arrows represent genes and their respective orientation on each strand as identified by Mitos2. Labels and features are colored according to their gene type (tRNA (blue), Coding domain sequence (orange), rRNA (green), and non-coding/regulatory features (red). Compared to the typical vertebrate mitochondrial genome, O. beta retains three threonine tRNAs (T_0, 1, and 2) and three D-loop like control regions (OH_0, 1, and 2), as opposed to one. Tracks visualize mitochondrial features for Opsanus beta, two other Batrichoids (B. trispinosus and P. myriaster) and two”typical” teleosts (L. oculatus as a basal teleost, and D. rerio as a model teleost). CDS and rRNA are labeled by their gene symbol within each box, while tRNAs are labeled by the codon product amino acid either above or below for heavy and light strand encoded tRNAs respectively. Grey polygons represent conserved sequence regions as determined by pairwise BLASTn alignments with a minimum evalue of 1e-6 and a word size of 7. All mitochondrial sequences were linearized to start at the first base of the Phenyalanine tRNA. Gene order and pairwise alignments demonstrate conserved”typical” vertebrate arrangement in outgroup teleosts and unique Batrichoid arrangement, with highly conserved gene order between P. myriaster and O. beta. An even more highly derived order was observed in B. trispinosus with some conserved gene blocks with the two other Batrichoids as shown previously
Fig. 7
Fig. 7
Repeat landscape of curated de novo repeats in FishTEDB in Thalassophryne amazonica (A) and curated repeats in Opsanus beta (B). Bar plot: proportion of genome covered by each transposable element (TE) class against Kimura-2 distance parameter, binned by values of 1 from 0 to 50. Smaller/larger Kimura values represent lower/higher divergence from reference, suggesting newer/older repeat divergence, respectively. Inset pie chart: total proportion of genome covered by each repeat class. Repeat classes that cover greater than 5% of the genome are labeled
Fig. 8
Fig. 8
Putative centromeric region of chromosome-scale super-scaffold Scaffold_1. 5 Mbp region of Scaffold 1 with major clusters of monomers represented as colored blocks, with each color representing a consensus monomer size as identified by TRASH. Consensus sequence from TRASH for most common 45mer and 128mer of putative centromeric and pericentromeric region. Identity heatmap of 82-83Mbp region of Scaffold 1 with occurrence of canonical Higher Order Repeats (HOR) of most common 45mer (45mer 1) as colored strips. Render generated with StainedGlass. Repeat number of canonical HORs of 45mer 1 in putative centromeric region. Analysis renerated with HiCAT

References

    1. Greenfield D, Winterbottom R, Collette B. Review of the Toadfish Genera (Teleostei: Batrachoididae). 2008.
    1. Ultsch GR, Jackson DC, Moalli R. Metabolic oxygen conformity among lower vertebrates: The toadfish revisited. J Comp Physiol. 1981;142:439–43.
    1. McDonald MD, Grosell M. Maintaining osmotic balance with an aglomerular kidney. Comp Biochem Physiol A Mol Integr Physiol. 2006;143:447–58. - PubMed
    1. Nirchio M, Fenocchio A, Swarca A, Dias A, Giuliano−Caetano L, Ron E, Cytogenetic Characterization of Thalassophryne maculosa Gunther, et al. (Pisces: Batrachoididae) from Margarita Island. Venezuela Caribb J Sci. 1861;2004(40):218–22.
    1. Palazón Fernández JL, Nirchio M, Sarasquete C. Conventional karyotype and nucleolar organizer regions of the toadfish Halobatrachus didactylus (Schneider, 1801) (Pisces: Batrachoididae). Cariotipo y regiones organizadoras del nucleolo del pez sapo marinoHalobatrachus didactylus (Schneider, 1801) (Pisces: Batrachoididae). 2003.

LinkOut - more resources