Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb;41(2):448-461.
doi: 10.1002/etc.5266. Epub 2022 Jan 18.

De Novo Assembly of the Nearly Complete Fathead Minnow Reference Genome Reveals a Repetitive but Compact Genome

Affiliations

De Novo Assembly of the Nearly Complete Fathead Minnow Reference Genome Reveals a Repetitive but Compact Genome

John W Martinson et al. Environ Toxicol Chem. 2022 Feb.

Abstract

The fathead minnow is a widely used model organism in environmental toxicology. The lack of a high-quality fathead minnow reference genome, however, has severely hampered its uses in toxicogenomics. We present the de novo assembly and annotation of the fathead minnow genome using long PacBio reads, Bionano and Hi-C scaffolding data, and large RNA-sequencing data sets from different tissues and life stages. The new annotated fathead minnow reference genome has a scaffold N50 of 12.0 Mbp and a complete benchmarking universal single-copy orthologs score of 95.1%. The completeness of annotation for the new reference genome is comparable to that of the zebrafish GRCz11 reference genome. The fathead minnow genome, revealed to be highly repetitive and sharing extensive syntenic regions with the zebrafish genome, has a much more compact gene structure than the zebrafish genome. Particularly, comparative genomic analysis with zebrafish, mouse, and human showed that fathead minnow homologous genes are relatively conserved in exon regions but had strikingly shorter intron regions. The new fathead minnow reference genome and annotation data, publicly available from the National Center for Biotechnology Information and the University of California Santa Cruz genome browser, provides an essential resource for aquatic toxicogenomic studies in ecotoxicology and public health. Environ Toxicol Chem 2022;41:448-461. Published 2021. This article is a U.S. Government work and is in the public domain in the USA.

Keywords: Comparative genomics; Fathead minnow; Gene structure; Genome assembly; Toxicogenomics; Zebrafish.

PubMed Disclaimer

Figures

FIGURE 1:
FIGURE 1:
The distribution and density of genes (in red) and repeat elements (in gray) on the 15 longest scaffolds of the fathead minnow genome, along with sequence gaps (in green). On each scaffold, a pseudo-centromere (black dot) is located at center of the longest gap, indicating the potential location of centromeres on chromosomes. The rooted phylogenetic tree (bottom right) shows the evolutionary relationship of fathead minnow with other model organisms, including closely related fish species (in red).
FIGURE 2:
FIGURE 2:
Assembly of the fathead minnow (FHM) reference genome. (A) The assembling process. The order of eight assembly steps was optimized to achieve the best assembly. (B) Dotplot of the FHM genome assemblies before and after purging haplotigs. The x-axis is the original FHM diploid genome assembly (1.45 Gbp) by CANU, and the y-axis is the one (0.93 Gb) after purging haplotigs. The removal of haplotigs reduced the size of the FHM genome assembly by almost 520 Mbp. (C) Number of scaffolds and N50 of FHM assemblies at different assembly steps. The x-axis shows different assembly steps from the initial assembling of diploid daft to the final scaffolding with HiC data. The primary y-axis (blue solid line with the round dot marker) is of the number of scaffolds, and the secondary y-axis (red dotted line with triangle marker) is N50. Both y-axes are in the log2 scale. (D) Scaffold length distribution of the FHM genome assembly. Blue dots are lengths of individual scaffolds, and orange ones are the cumulative percentage of the total genome length. The plot shows only the first 125 scaffolds.
FIGURE 3:
FIGURE 3:
Gene function annotation and visualization. (A) Distribution of 27 landmark species with respect to the number of top-blast hits in homolog search. (B) Gene ontology (GO) term distribution among biological process, molecular function, and cellular compartment categories. (C) GO term distribution among all predicted transcripts in the fathead minnow genome. BP = biological process; MF = molecular function; CC = cellular compartment.
FIGURE 4:
FIGURE 4:
Cross-species comparison between fathead minnow (FHM) and other related species. (A) Comparison of homologous gene structure. For each homologous gene in zebrafish (ZF), mouse, and human, the ratio of total intron/exon length against the corresponding FHM homologous gene was computed. The first group in the figure is the comparison with ZF (*_Z), the second group with mouse (*_M), and the last group with human (*_H). The box-and-whisker plots of length ratios show that ZF, mouse, and human all have overall much longer intron regions than FHM in their homologous genes, with a median ratio of 1.89 for ZF, 2.71 for mouse, and 3.41 for human. In contrast, the length differences in exon regions of the same homologous genes are relatively small, though they remain highly significant statistically (Wilcoxon signed-rank test, p < 2.2e-16). Interestingly, in comparison with FHM, ZF (0.88) has overall a slightly shorter length of exon regions, mouse (1.07) has a very similar length, while human (1.17) has a slightly longer exon region. (B) The gene structure comparison of the elovl6 (elongation of very long chain fatty acids 6) homologous genes from FHM, ZF, mouse, and human. The plot shows a high-score pairs alignment map of the four homologous genes, generated by GEvo with Lagan as the alignment tool. On the top is the shortest FHM gene of 3956 bp, second from top is the ZF gene of 57,314 bp, the third from the top is the mouse gene of 106,101 bp, and at the bottom is the longest human gene of 152,761 bp. The large difference in gene length among the four homologs is largely due to changes of intron regions, of which the lengths of the ZF, mouse, and human genes are approximately 18, 32, and 46 times the length of the FHM gene, respectively. The corresponding exon regions of the ZF, mouse, and human genes are only 1.3, 7.6, and 8.1 times that of the FHM one, much smaller changes compared to intron regions.
FIGURE 5:
FIGURE 5:
Circos plots of genomic synteny between the fathead minnow (FHM2) and zebrafish (ZF) genome references. (Left) The ZF syntenic regions mapped to the FHM genome. (Right) The FHM syntenic regions mapped to the ZF genome. For the synteny comparison, the ZF genome reference was used as the reference guide to assemble the FHM scaffolds into 25 pseudo-chromosomes. Satsuma (Ver 3.1.0; Grabherr et al., 2010) was used to generate the whole-genome synteny, and mySyntenyPortal (Lee et al., 2018) was used to create the plots. The plots show the longest 25,000 syntenic genomic regions between FHM and ZF, with the shortest syntenic region being 587 bp.
FIGURE 6:
FIGURE 6:
Homolog comparison between fathead minnow (FHM2) and zebrafish (ZF). (A) Syntenic map of the genomic region surrounding the FMt022379 in FHM2 and the ENDART00000132491 in ZF. The genomic region containing the syntenic region in the ZF is approximately 200 kbp, over two times larger than the corresponding region (∼90 kbp) in the FHM2 genome. In this syntenic region, FHM has much shorter introns or intergenic regions. The yellow regions in the FHM genome are repeats or gaps. (B) Genomic map of vitellogenin (vtg) genes in the ZF and FHM. The University of California Santa Cruz gene map plot on the top shows the cluster of six vtg genes in chromosome 22 of the ZF genome, while the one on the bottom shows a similar cluster of six vtg genes in scaffold 140 of the FHM2 genome. (C) Syntenic map of the vtg1 genes between ZF and FHM. (D) Syntenic plot of the estrogen receptor 1 gene between ZF and FHM. UCSC = University of California Santa Cruz; US EPA = US Environmental Protection Agency; NCBI = National Center for Biotechnology Information.

Similar articles

Cited by

References

    1. Altschul SF, Gish W, Miller W, Myers EW, & Lipman DJ (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. - PubMed
    1. Ankley GT, Jensen KM, Kahl MD, Korte JJ, & Makynen EA (2001). Description and evaluation of a short-term reproduction test with the fathead minnow (Pimephales promelas). Environmental Toxicology and Chemistry, 20(6), 1276–1290. - PubMed
    1. Ankley GT, & Johnson RD (2004). Small fish models for identifying and assessing the effects of endocrine-disrupting chemicals. Institute for Laboratory Animal Research Journal, 45(4), 469–483. - PubMed
    1. Ankley GT, Kuehl DW, Kahl MD, Jensen KM, Linnum A, Leino RL, & Villeneuvet DA (2005). Reproductive and developmental toxicity and bioconcentration of perfluorooctanesulfonate in a partial life-cycle test with the fathead minnow (Pimephales promelas). Environmental Toxicology and Chemistry, 24(9), 2316–2324. - PubMed
    1. Biran J, Golan M, Mizrahi N, Ogawa S, Parhar IS, & Levavi-Sivan B (2014). Direct regulation of gonadotropin release by neurokinin B in tilapia (Oreochromis niloticus). Endocrinology, 155(12), 4831–4842. - PubMed