Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Jan 29;13(3):471.
doi: 10.3390/ani13030471.

Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles

Affiliations
Review

Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles

Daren C Card et al. Animals (Basel). .

Abstract

Non-avian reptiles comprise a large proportion of amniote vertebrate diversity, with squamate reptiles-lizards and snakes-recently overtaking birds as the most species-rich tetrapod radiation. Despite displaying an extraordinary diversity of phenotypic and genomic traits, genomic resources in non-avian reptiles have accumulated more slowly than they have in mammals and birds, the remaining amniotes. Here we review the remarkable natural history of non-avian reptiles, with a focus on the physical traits, genomic characteristics, and sequence compositional patterns that comprise key axes of variation across amniotes. We argue that the high evolutionary diversity of non-avian reptiles can fuel a new generation of whole-genome phylogenomic analyses. A survey of phylogenetic investigations in non-avian reptiles shows that sequence capture-based approaches are the most commonly used, with studies of markers known as ultraconserved elements (UCEs) especially well represented. However, many other types of markers exist and are increasingly being mined from genome assemblies in silico, including some with greater information potential than UCEs for certain investigations. We discuss the importance of high-quality genomic resources and methods for bioinformatically extracting a range of marker sets from genome assemblies. Finally, we encourage herpetologists working in genomics, genetics, evolutionary biology, and other fields to work collectively towards building genomic resources for non-avian reptiles, especially squamates, that rival those already in place for mammals and birds. Overall, the development of this cross-amniote phylogenomic tree of life will contribute to illuminate interesting dimensions of biodiversity across non-avian reptiles and broader amniotes.

Keywords: GC content; anonymous loci; genome size; isochores; karyotype; natural history; reduced representation; repetitive elements; sex determination and chromosomes; target capture; ultraconserved elements.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest with the content of this article.

Figures

Figure 1
Figure 1
Overview of the natural history of amniotes, including non-avian reptiles, in a phylogenetic context. The width of clades on the phylogeny is proportional to species diversity, which are noted for each clade. For sex determination, GSD is denoted by the male and female symbols for male and female heterogamety, respectively, and TSD is denoted by the thermometer symbol [8,9]. Reproductive mode is indicated with an egg (oviparity), a lizard (viviparity), and a budding yeast symbol (parthenogenesis) [8,10,11,12]. Note the small egg for mammals that reflects the oviparous Monotremata (5 extant species). For genome size (C-value), data from the Animal Genome Size Database [13] were averaged per species and the clade-wise average was calculated as the mean of these species estimates. Karyotype is reported as the mean number of haploid chromosome counts per clade based on the ACC database (https://cromanpa94.github.io/ACC/ (accessed on 1 December 2022)) and lineages with microchromosomes present are indicated with a symbol near the mean chromosome count. Sex chromosome data were gathered from the Tree of Sex database [14]: the proportions of homomorphic, XY, XO, and ZW sex chromosome systems for each clade are indicated with the total species sample size per clade. The small number of squamates with homomorphic sex chromosomes (N = 6) and mammals with XO sex chromosomes (N = 3) are noted, and for counting purposes, complex XY and ZW systems were set to XY and ZW systems, respectively. For repeat content (reported as percentage of the total genome), data from the literature (see [15,16,17,18,19,20,21] and references therein) were averaged per clade. For GC content (reported as percentage of the total genome), data retrieved from the NCBI Genome Assembly database [22] were averaged per species and the clade-wise average was calculated as the mean of these species estimates. Clades with isochore structure are indicated with symbols below the GC estimate [23,24,25,26,27,28,29,30,31] and the isochore symbol for Squamata has a broken border and faded color to indicate the partial loss of isochores in some proportion of species in that lineage. Bars behind the data points are standard deviation. Data gathered from databases were retrieved on 1 December 2022. This figure was inspired by Janes et al. [32].
Figure 2
Figure 2
Temporal accumulation of genomes available on NCBI for major amniote clades (data retrieved 1 December 2022). Inset: Details of the growth in the number of available genomes for non-avian reptiles. Note: The counts from this dataset represent a subset of the full non-avian reptile genomes dataset presented in Figure 3, as many genomes are available from sources other than NCBI. This figure was inspired by Bravo et al. [148].
Figure 3
Figure 3
Phylogenetic summary of available reference genomes for non-avian reptiles. The topology and divergence times were gathered from the TimeTree database (accessed 1 December 2022) [4,5]. For taxa that were not already included in TimeTree, we used existing studies of Gehyra [149,150,151,152], Heloderma [153], Physignathus [154], Gopherus [155,156], Actinemys [157], Cuora [158,159], and Myanophis [160,161] to place taxa and determine the approximate divergence time. Horizontal bars delineate the major clades: Squamata, Rhyncocephalia (“R”), Testudines, and Crocodylia (“Croc”). The colored bars to the right of each panel indicate each clade and aid in visualization. Publicly-available and announced genomes were collated from NCBI, the Genome10K/VGP/EBGP GenomeArk website (https://genomeark.github.io/genomeark-all/ (accessed on 1 December 2022)), the DNAZoo website (https://www.dnazoo.org/assemblies (accessed on 1 December 2022)), the Australian Amphibian and Reptile Genomics (AusARG) initiative website (https://ausargenomics.com/ (accessed on 1 December 2022)), the California Conservation Genomics Project (CCGP) website (https://www.ccgproject.org/reptiles (accessed on 1 December 2022)), and other locations noted in the literature. For each assembly, we gathered the release date, total assembly length and number of ambiguous (N) bases, and calculated scaffold N50 and contig N50 after breaking scaffolds at runs of >25 Ns. We also ran BUSCO v. 5.4.2 [162] in ‘genome’ mode with the tetrapoda_odb10 dataset to assess the completeness of genomes based on 5310 generally conserved, single-copy tetrapod genes and used bedtools v. 2.29.0 [163] and seqtk v. 1.3-r106 (https://github.com/lh3/seqtk (accessed on 1 December 2022)) to calculate GC content in 500 kb genomic windows (where a minimum of 250 kb of non-N bases were present). Some genomes were not contiguous enough for GC content distributions to be estimated. Where multiple assemblies were available for a species, we plotted the release date and source of each assembly but only quantify genomic characteristic and quality metrics for the primary assembly with the highest-quality assembly based on contiguity and BUSCO results, most of which were designated as the primary assembly on NCBI. Secondary assemblies are those additional assemblies for a given species and future assemblies reflect forthcoming genomes for species that were publicly announced where data are not yet available.
Figure 4
Figure 4
Graphical overview of various reduced representation approaches used in phylogenomics investigations. Alternative depictions are presented for different methods of enriching for particular loci in the genome: two kinds of target capture (targeting UCEs and AHEs or exons), RAD-seq (also known as GBS), and transcriptomics. In each case, the color indicates the location of phylogenetically informative signal in the locus, which typically comprises the whole extent of the target locus, except in the case of UCEs, where this signal is found in the regions flanking the locus. These classes of loci, or markers, are depicted along a diploid genome for a single sample, with heterozygous variation in the form of two alleles at each locus indicated with alternative shading. Although only a single sample is indicated, these approaches would be applied to all samples of interest in parallel, ultimately resulting in sequencing for all samples (e.g., N = 3 samples depicted below). For target capture, the genome is fragmented, and oligonucleotide probes are used to enrich for the target loci. For RAD-seq and transcriptomics, regions of interest are isolated and enriched simultaneously by restriction enzymes and cellular RNA polymerase transcription activity followed by in vitro reverse transcription, respectively. Importantly, of the three general methods, only target capture requires a priori sequencing data and knowledge to construct oligonucleotide probes. After this isolation and enrichment step, all methods proceed generally the same way with standard library preparation and sequencing steps. The resulting sequencing data are also generally analyzed similarly by bioinformatically parsing data to recover sample-specific sequences (three samples are indicated) and clustering sequences by similarity to enable consensus calling (not shown), although a reference genome can aid in this process. Variation across loci is ideally phased to recover the original heterozygous state—two phased alleles per sample are depicted. Phased sequence data for each sample and locus can then be aligned and used for phylogenetic inference.
Figure 5
Figure 5
The ALFIE software pipeline for in silico extraction of anonymous loci sequences from complete genome sequences and assembling ready-to-analyze data sets. The user first inputs genome sequences in FASTA format, one of which must be a reference genome with a GFF (general features format) file of genomic annotations, namely protein-coding genes, and regulatory regions. The program then maps the presumably neutral intergenic or “anonymous” regions by applying a user-specified physical distance threshold (in base pairs [bp]). This filter discards all chromosomal regions that contain known functional elements and their flanking sequences (up to the threshold distance), thereby helping to ensure that retained anonymous regions are unaffected by natural selection (e.g., background selection). The anonymous regions are then split into user-specific locus lengths (in bp), which are referred to as “candidate anonymous loci.” In the final steps (not shown), the program uses candidate anonymous loci as query sequences to conduct BLAST searches against all input genomes, keeping only single-copy loci in all genomes, before saving them to a FASTA file. Next, the program conducts multiple sequence alignments for all loci before using a second user-defined distance threshold (in bp) to retain loci that are spaced far enough from other sampled loci that they likely meet the independent gene tree assumption. Lastly, the program outputs the dataset in NEXUS, PHYLIP, and FASTA formats, and can use other included modules to find in automated fashion the best DNA substitution model and gene tree for each locus (figure modified after Figure 1 in Costa et al. [193]). See also Jennings [189] for further explanation and extensions of physical distance threshold theory. Reprinted with permission from Costa et al. [193].

References

    1. Carrano M.T., Gaudin T.J., Blob R.W., Wible J.R., editors. Amniote Paleobiology: Perspectives on the Evolution of Mammals, Birds, and Reptiles. University of Chicago Press; Chicago, IL, USA: 2006.
    1. Shedlock A.M., Edwards S.V. Amniotes (Amniota) In: Hedges S.B., Kumar S., editors. The Timetree of Life. Oxford University Press; New York, NY, USA: 2009. pp. 375–379.
    1. Sues H.-D. The Rise of Reptiles: 320 Million Years of Evolution. Illustrated ed. Johns Hopkins University Press; Baltimore, MA, USA: 2019.
    1. Hedges S.B., Dudley J., Kumar S. TimeTree: A Public Knowledge-Base of Divergence Times among Organisms. Bioinformatics. 2006;22:2971–2972. doi: 10.1093/bioinformatics/btl505. - DOI - PubMed
    1. Kumar S., Stecher G., Suleski M., Hedges S.B. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol. Biol. Evol. 2017;34:1812–1819. doi: 10.1093/molbev/msx116. - DOI - PubMed

LinkOut - more resources