Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 8;20(1):245.
doi: 10.1186/s12915-022-01427-8.

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

Affiliations

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

Huishi Toh et al. BMC Biol. .

Abstract

Background: The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic.

Results: We report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse.

Conclusions: Our findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.

Keywords: Arvicanthis niloticus; Diabetes; Diurnal; Genome; Germline mutation rate; Heterozygosity; Long-read genome assembly; Orthology; Positive selection; Retrogenes; Segmental duplications.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Nile rat genome assembly. a The Nile rat (Arvicanthis niloticus). b Scaffolded chromosomes in the maternal and paternal assemblies. Ribbons show similarities between sequences. In order to assess their heterozygosity spectrum, the assemblies have been modified from their GenBank versions as described in Materials and Methods. These modifications are documented in [3]. c The contig N50 values of Nile rat (Arvicanthis niloticus, red), house mouse (Mus musculus, blue), Norway rat (Rattus norvegicus, blue), and 106 other rodent genomes deposited in GenBank. d Assembly completeness evaluated using BUSCO scores, demonstrating high completeness and average percent duplicated genes that are anticipated to be single-copy genes in rodent genomes
Fig. 2
Fig. 2
Segmental duplication content in Nile rat and related species (We used the following genome assembly versions: house mouse mm10 = GRCm38 [59], house mouse C57BL = ASM377452v2 (a PacBio long reads-based assembly [56, 60]), Norway rat mRatBN7.2 [61], and white-footed mouse UCI_PerLeu_2.1 [62]). a The total bases annotated as segmental duplication by SEDEF. The total includes all pairwise alignments after filtering for common repeats. b The total number of multi-exon genes duplicated in each of the assemblies for resolved (res.) and collapsed (col.) genes. c Organization of duplicated genes in the Nile rat. Tandemly duplicated genes are in blue; interspersed are in green. Genes in collapsed duplications are indicated as dots in the perimeter. The chromosomes are ordered according to genomic scaffold accessions
Fig. 3
Fig. 3
Examples of duplicated genes in Nile rat. a An expansion of the Acnat2 gene in Nile rat relative to house mouse. Lines are drawn using miropeats [66], with spurious matches outside of gene bodies removed. Colors are used to emphasize gene paralogs. b A dot-plot of the Acnat2 locus in Nile rat, with gene copies indicated by the blue rectangles. c Read-depth over Slfn3 in the Nile rat paternal assembly. The average read depth is shown in green, indicating up to four missing copies. The gene is mapped using RefSeq GRCm38 annotations, with support from PacBio Iso-Seq reads. The gene is associated with immune response, a category of genes that often have large copy-number diversity
Fig. 4
Fig. 4
Amylase gene cluster. a The sequence homology in the amylase locus for mouse (top) and Nile rat rendered using miropeats. The five TOGA annotations of amylase are each rendered using separate colors. The blue and purple copies show amylase-2 homologies, orange is amylase-1, and red/green lines are annotated pseudogenes. b Pairwise similarity of amylase genes in human, Nile rat, mouse, and Norway rat, ordered according to their genomic coordinates. c A phylogenetic tree using COBALT multiple sequence alignment of amylase proteins from each of the four genomes in b and white footed mouse
Fig. 5
Fig. 5
Ybx3-like elements in the Nile rat genome. a Ybx3-like elements are interspersed throughout the genome. Many have been annotated as protein coding genes or pseudogenes by NCBI. Most are recognized as segmental duplications by SEDEF. b The architecture of a typical Ybx3-like gene, LOC117723436, visualized in the NCBI Genome Data Viewer. This gene has one large and one small exon. The large exon is flanked by MERVK26-int and RMER13B endogenous retroviral elements. It contains a CSD domain and is partially supported by short read RNA-seq data. c Expression of LOC117701283 in testis visualized in the UCSC genome browser. The three Iso-seq transcripts have identical CDSs, represented by thick boxes. MERV26-int is located in the 5′ UTR region, rather than outside the large exon like in most other Ybx3-like genes. d Multiple alignment of predicted Ybx3-like proteins and the canonical Ybx3, visualized by NCBI COBALT. This visualization uses the Rasmol color scheme, where amino acids with similar properties are shown in matching colors. The canonical protein is in the first row

References

    1. Consortium MGS, Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. - DOI - PubMed
    1. Consortium RGSP, Rat Genome Sequencing Project Consortium Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. doi: 10.1038/nature02426. - DOI - PubMed
    1. Yang C, Zhang G, Toh H, et al. Heterozygosity spectrum: all.makeup.agp. Nile rat genome paper supplementary materials on OSF. 2021. 10.17605/OSF.IO/K6EY9.
    1. Yan L, Smale L, Nunez AA. Circadian and photic modulation of daily rhythms in diurnal mammals. Eur J Neurosci. 2020;51:551–566. doi: 10.1111/ejn.14172. - DOI - PMC - PubMed
    1. Kalsbeek A, Verhagen LAW, Schalij I, Foppen E, Saboureau M, Bothorel B, Buijs RM, Pévet P. Opposite actions of hypothalamic vasopressin on circadian corticosterone rhythm in nocturnal versus diurnal species. Eur J Neurosci. 2008;27:818–827. doi: 10.1111/j.1460-9568.2008.06057.x. - DOI - PubMed

Publication types