Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 19;20(1):134.
doi: 10.1186/s12862-020-01696-3.

Divergent genes in gerbils: prevalence, relation to GC-biased substitution, and phenotypic relevance

Affiliations

Divergent genes in gerbils: prevalence, relation to GC-biased substitution, and phenotypic relevance

Yichen Dai et al. BMC Evol Biol. .

Abstract

Background: Two gerbil species, sand rat (Psammomys obesus) and Mongolian jird (Meriones unguiculatus), can become obese and show signs of metabolic dysregulation when maintained on standard laboratory diets. The genetic basis of this phenotype is unknown. Recently, genome sequencing has uncovered very unusual regions of high guanine and cytosine (GC) content scattered across the sand rat genome, most likely generated by extreme and localized biased gene conversion. A key pancreatic transcription factor PDX1 is encoded by a gene in the most extreme GC-rich region, is remarkably divergent and exhibits altered biochemical properties. Here, we ask if gerbils have proteins in addition to PDX1 that are aberrantly divergent in amino acid sequence, whether they have also become divergent due to GC-biased nucleotide changes, and whether these proteins could plausibly be connected to metabolic dysfunction exhibited by gerbils.

Results: We analyzed ~ 10,000 proteins with 1-to-1 orthologues in human and rodents and identified 50 proteins that accumulated unusually high levels of amino acid change in the sand rat and 41 in Mongolian jird. We show that more than half of the aberrantly divergent proteins are associated with GC biased nucleotide change and many are in previously defined high GC regions. We highlight four aberrantly divergent gerbil proteins, PDX1, INSR, MEDAG and SPP1, that may plausibly be associated with dietary metabolism.

Conclusions: We show that through the course of gerbil evolution, many aberrantly divergent proteins have accumulated in the gerbil lineage, and GC-biased nucleotide substitution rather than positive selection is the likely cause of extreme divergence in more than half of these. Some proteins carry putatively deleterious changes that could be associated with metabolic and physiological phenotypes observed in some gerbil species. We propose that these animals provide a useful model to study the 'tug-of-war' between natural selection and the excessive accumulation of deleterious substitutions mutations through biased gene conversion.

Keywords: GC bias; Genome evolution; Insulin receptor; Medag; Metabolism; Osteopontin; Pancreatic duodenal homeobox 1; Protein evolution; gBGC.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Phylogenetic relationships between species analyzed and patterns of protein sequence divergence. a Phylogenetic tree showing evolutionary relationships between the species analyzed, with approximate divergence time estimated by a previous study [32]. Outgroup species used for most proteins are marked with an asterisk, murid species in pink, gerbil species in red. b-d Adjusted Sneath values for orthologous proteins compared between (b) two gerbil species, (c) two murid species, (d) the average of two murid species and the sand rat. Each point represents one protein; proteins encoded by genes in the extreme GC-rich region of sand rat are shown in pink or (for PDX1 and INSR) in red. Only one protein encoded within the extreme GC-rich region is present in (b) due to missing genes in the Mongolian jird genome assembly. Photographs from J.F. Mulley and Pixabay with permission
Fig. 2
Fig. 2
Dissimilarity difference ranking for 9771 sand rat proteins against the difference in adjusted Sneath value compared to the murid homologue. The top 61 ranked proteins are enlarged in the bottom plot with proteins PDX1 (rank 1) and INSR (rank 6) marked with arrows. Amongst these proteins, those encoded by genes in the extreme GC-rich region are shown in pink
Fig. 3
Fig. 3
Aberrantly divergent sand rat proteins are frequently encoded by genes in GC-rich islands. Each panel shows one mouse chromosome (scale in Mb) to which the locations of sand rat orthologues are mapped. All analyzed sand rat genes are displayed as dots plotted according to the midpoint position of their corresponding mouse orthologue. The position of each dot on the y-axis shows the difference in adjusted Sneath value between the sand rat and mouse orthologues. Pink lines indicate locations of GC-rich regions identified previously [3, 5]. ‘Clusters’ of aberrantly divergent proteins with more than two proteins mapped to regions less than 1 Mb apart are marked with open red boxes
Fig. 4
Fig. 4
Mutational patterns at synonymous sites for orthologous genes from four rodent species. Genes encoding aberrantly divergent proteins are highlighted with red dots. a Graph comparing weak-to-strong (dSws) and strong-to-weak (dSsw) synonymous mutation rates for sand rat genes. The ‘chimney’ shape indicates that many sand rat genes have undergone GC biased nucleotide changes; many of the aberrantly divergent proteins are in this category. Eighteen sand rat genes (including 12 aberrantly divergent genes) with dSws and/or dSsw values above 2 have been artificially converted to dSws = 2 and/or dSsw = 2 to give a more comprehensive view. b Graph comparing weak-to-strong (dSws) and strong-to-weak (dSsw) synonymous mutation rates for Mongolian jird genes. The ‘chimney’ shape indicates GC bias; most aberrantly divergent proteins are in this category. Ten Mongolian jird genes (including two aberrantly divergent genes) with dSws and/or dSsw values larger than 2 have been artificially converted to dSws = 2 and/or dSsw = 2. c Graph comparing dSws and dSsw for mouse genes. d Graph comparing dSws and dSsw for rat genes. Photographs from J.F. Mulley and Pixabay with permission
Fig. 5
Fig. 5
Alignment of key functional domains in PDX1, INSR, MEDAG, and SPP1 proteins. a Alignment of the conserved PDX1 hexapeptide and homeodomain sequence from representative vertebrates. Gerbil species shown in red; sites where amino acid substitutions are associated with T2D in humans are marked with a star. b Alignment of regions for four domains in the INSR protein. Due to sequence divergence across vertebrates, only sequences from mammals are shown. Gerbil species are shown in red; sites where amino acid substitutions are associated with T2D in humans are marked with a star. c Alignment of a representative region of the MEDAG protein. Gerbil species shown in red. d Alignment of a representative region of the SPP1 protein. Gerbil species are shown in red

Similar articles

Cited by

References

    1. Haines H, Hackel DB, Schmidt-Nielsen K. Experimental diabetes mellitus induced by diet in the sand rat. Am J Physiol Content. 1965;208(2):297–300. doi: 10.1152/ajplegacy.1965.208.2.297. - DOI - PubMed
    1. Leibowitz G, Ferber S, Apelqvist A, Edlund H, Gross DJ, Cerasi E, et al. IPF1/PDX1 deficiency and beta-cell dysfunction in Psammomys obesus, an animal with type 2 diabetes. Diabetes. 2001;50(8):1799–806. - PubMed
    1. Hargreaves AD, Zhou L, Christensen J, Marlétaz F, Liu S, Li F, et al. Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster. Proc Natl Acad Sci. 2017;114(29):7677–7682. doi: 10.1073/pnas.1702930114. - DOI - PMC - PubMed
    1. Dai Y, Holland PWH. The interaction of natural selection and GC skew may drive the fast evolution of a sand rat Homeobox gene. Mol Biol Evol. 2019;36(7):1473–1480. doi: 10.1093/molbev/msz080. - DOI - PMC - PubMed
    1. Pracana R, Hargreaves AD, Mulley JF, Holland PWH. Runaway GC evolution in gerbil genomes. Mol Biol Evol. 2020;37(8):2197–2210. doi: 10.1093/molbev/msaa072. - DOI - PMC - PubMed

Publication types