Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.

PubMed Disclaimer

Conflict of interest statement

L.G. is a co-founder of, equity owner in and chief technical officer at Fauna Bio Incorporated.

Figures

Fig. 1
Fig. 1. The Zoonomia Project brings the fraction of eutherian families that are represented by at least one assembly to 83%.
Phylogenetic tree of the mammalian families in the Zoonomia Project alignment, including both our new assemblies and all other high-quality mammalian genomes publicly available in GenBank when we started the alignment (March 2018) (Supplementary Table 2). Tree topology is based on data from TimeTree (www.timetree.org). Existing taxonomic classifications recognize a total of 127 extant families of eutherian mammal, including 43 families that were not previously represented in GenBank (red boxes) and 41 families with additional representative genome assemblies (pink boxes). Of the remaining families, 21 had GenBank genome assemblies but no Zoonomia Project assembly (grey boxes) and 22 had no representative genome assembly (white boxes). Parenthetical numbers indicate the number of species with genome assemblies in a given family. Image credits: fossa, Bertal/Wikimedia (CC BY-SA); Arctic fox, Michael Haferkamp/Wikimedia (CC BY-SA); hirola, JRProbert/Wikimedia (CC BY-SA); bumblebee bat, Sébastien J. Puechmaille (CC BY-SA); snowshoe hare, Denali National Park and Preserve/Wikimedia (public domain); aye-aye, Tom Junek/Wikimedia (CC BY-SA); Geoffroy’s spider monkey, Patrick Gijsbers/Wikimedia (CC BY-SA); southern three-banded armadillo, Hedwig Storch/Wikimedia (CC BY-SA); giant anteater, Graham Hughes/Wikimedia (CC BY-SA); brown-throated sloth, Dick Culbert from Gibsons, B.C., Canada/Wikimedia (CC BY).
Fig. 2
Fig. 2. Genetic diversity varies across IUCN conservation categories.
a, b, Heterozygosity declines (a) and SoH value increases (b) with the level of concern for species conservation, as assessed by IUCN conservation categories. Horizontal grey lines indicate median. c, d, Comparing individuals sampled from wild and captive populations, we saw no statistically significant difference (independent samples t-test) in overall heterozygosity (c) or per cent SoH (d), with similar means (horizontal grey lines) between types of birth population. In ad, there was a total of 105 species, with n for each tested category indicated on the x axis. Statistical tests were two-sided. LC, least concern. e, Overall heterozygosity and SoH values for all genomes analysed (including those with high allelic balance ratio; n = 124 species), with median SoH (17.1%, horizontal dashed line) and median overall heterozygosity (0.0026, vertical dashed line) for species categorized as least concern. Values for individuals from the seven critically endangered species are shown in red. Source data
Fig. 3
Fig. 3. The Zoonomia alignment doubles the fraction of the human genome predicted to be under purifying selection at single-base-pair resolution.
a, Cactus alignments are reference-genome-free, enabling the detection of sequence that is absent from human (red) or other clades (purple), lineage-specific innovations (orange and green) and eutherian-mammal-specific sequence (blue). b, We compared phyloP predictions of conserved positions for a widely used 100-vertebrate alignment (n = 100 vertebrate species) (grey) to the Zoonomia alignment (n = 240 eutherian species) (red). The cumulative portion of the genome expected to be covered by true- versus false-positive calls is shown, starting from the highest confidence calls (solid line) and proceeding to calls with lower confidence (dashed lines); the horizontal line indicates the point at which the confidence level drops below an expected FDR of 0.05 (two-sided). c, A higher proportion of functionally annotated bases are detected as highly conserved (FDR < 0.05) in the Zoonomia alignment (red) than the 100-vertebrate alignment (grey), most notably in non-coding regions. lncRNA, long non-coding RNA; UTR, untranslated region. d, At a putative androgen-receptor binding site, phyloP scores predict that seven bases are under purifying selection (FDR < 0.05, two-sided) in the Zoonomia alignment (dark red), peaking in positions with the most information content in the androgen receptor JASPAR motif, compared to one (dark grey) in the 100-vertebrate alignment, with scores at FDR > 0.05 shown in light red (top) and light grey (bottom).
Extended Data Fig. 1
Extended Data Fig. 1. Notable traits in non-human mammals.
Sequences from species with notable phenotypes can inform human medicine, basic biology and biodiversity conservation, but sample collection can be challenging. a, The Jamaican fruit bat (Artibeus jamaicensis) maintains constant blood glucose across intervals of fruit-eating and fasting, achieving homeostasis to a degree that is unknown in the treatment of human diabetes. b, The North American beaver (Castor canadensis) avoids tooth decay by incorporating iron rather than magnesium into tooth enamel, which yields an orange hue. c, The thirteen-lined ground squirrel (Ictidomys tridecemlineatus) prepares for hibernation by rapidly increasing the thermogenic activity of brown fat, a process that—in humans—is connected to improved glucose homeostasis and insulin sensitivity. d, The tiny bumblebee bat (Craseonycteris thonglongyai) is among the smallest of mammals, making it a sparse source of DNA. e, The remote habitat of the very rare Amazon River dolphin (Inia geoffrensis) precludes collection of the high-molecular weight DNA. Image sources: Merlin D. Tuttle/Science Source (a); Stephen J. Krasemann/Science Source (b); Allyson Hindle (c); Sébastien J. Puechmaille (CC BY-SA) (d); M. Watson/Science Source (e).
Extended Data Fig. 2
Extended Data Fig. 2. Sample collection can be challenging, and sequencing methods must be selected to handle the sample quality.
To enable the inclusion of species from across the eutherian tree (including from the 50% of mammalian families not represented in existing genome databases), the Zoonomia Project needed sequencing and assembly methods that produce reliable data from DNA collected in remote locations, sometimes in only modest quantities and often without benefit of cold chains for transport. a, For the marine species such as the narwhal (Monodon monoceros), simply accessing an individual in the wild can prove challenging. For example, to sample DNA from the near-threatened narwhal, M.N. and Inuit guide D. Angnatsiak camped on the edge of an ice floe between Pond Inlet and Bylot Island, at the northeastern tip of Baffin Island. After a narwhal was collected by Inuit hunters as part of an annual hunt, hours of flensing were necessary for the collection of tissue samples. From left to right, F. McCann, H. C. Schmidt, F. Eichmiller, M.N., J. Orr (facing backward) and J. Orr (standing). b, For endangered species such as the Hispaniolan solenodon (S. paradoxus), sample collection must be designed to minimize stress to the individual, limiting the amount of DNA that can be collected. To collect DNA from the endangered solenodon without imposing stress on individuals in the wild, N.R.C. turned to the world’s only captive solenodons, which are housed off-exhibit at ZOODOM in the Dominican Republic. With help from veterinarians at the zoo, N.R.C. collected a small amount of blood from the rugged tail of the solenodon. Narwhal photograph by G. Freund, and courtesy of M.N. Solenodon photograph courtesy of L. Emery.

References

    1. Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. - PMC - PubMed
    1. Hiller M, et al. A “forward genomics” approach links genotype to phenotype using independent phenotypic losses among related species. Cell Rep. 2012;2:817–823. - PMC - PubMed
    1. Wasser SK, et al. Genetic assignment of large seizures of elephant ivory reveals Africa’s major poaching hotspots. Science. 2015;349:84–87. - PMC - PubMed
    1. Wright B, et al. Development of a SNP-based assay for measuring genetic diversity in the Tasmanian devil insurance population. BMC Genomics. 2015;16:791. - PMC - PubMed
    1. Lappalainen T, Scott AJ, Brandt M, Hall IM. Genomic analysis in the age of human genome sequencing. Cell. 2019;177:70–84. - PMC - PubMed

Publication types

MeSH terms