Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 28;380(6643):eabn3943.
doi: 10.1126/science.abn3943. Epub 2023 Apr 28.

Evolutionary constraint and innovation across hundreds of placental mammals

Matthew J Christmas #  1 Irene M Kaplow #  2   3 Diane P Genereux  4 Michael X Dong  1 Graham M Hughes  5 Xue Li  4   6   7 Patrick F Sullivan  8   9 Allyson G Hindle  10 Gregory Andrews  7 Joel C Armstrong  11 Matteo Bianchi  1 Ana M Breit  12 Mark Diekhans  11 Cornelia Fanter  10 Nicole M Foley  13 Daniel B Goodman  14 Linda Goodman  15 Kathleen C Keough  15   16   17 Bogdan Kirilenko  18   19   20 Amanda Kowalczyk  2   3 Colleen Lawless  5 Abigail L Lind  16   17 Jennifer R S Meadows  1 Lucas R Moreira  4   7 Ruby W Redlich  21 Louise Ryan  5 Ross Swofford  4 Alejandro Valenzuela  22 Franziska Wagner  23 Ola Wallerman  1 Ashley R Brown  2   3 Joana Damas  24 Kaili Fan  7 John Gatesy  25 Jenna Grimshaw  26 Jeremy Johnson  4 Sergey V Kozyrev  1 Alyssa J Lawler  3   4   21 Voichita D Marinescu  1 Kathleen M Morrill  4   6   7 Austin Osmanski  27 Nicole S Paulat  26 BaDoi N Phan  2   3   27 Steven K Reilly  28 Daniel E Schäffer  2 Cynthia Steiner  29 Megan A Supple  30 Aryn P Wilder  29 Morgan E Wirthlin  2   3   31 James R Xue  4   32 Zoonomia Consortium§Bruce W Birren  4 Steven Gazal  33 Robert M Hubley  34 Klaus-Peter Koepfli  35   36   37 Tomas Marques-Bonet  38   39   40   41 Wynn K Meyer  42 Martin Nweeia  43   44   45   46 Pardis C Sabeti  4   32   47 Beth Shapiro  30   48 Arian F A Smit  34 Mark S Springer  49 Emma C Teeling  5 Zhiping Weng  7 Michael Hiller  18   19   20 Danielle L Levesque  12 Harris A Lewin  24   50   51 William J Murphy  13 Arcadi Navarro  38   40   52   53 Benedict Paten  11 Katherine S Pollard  16   17   54 David A Ray  26 Irina Ruf  55 Oliver A Ryder  29   56 Andreas R Pfenning  2   3 Kerstin Lindblad-Toh #  1   4 Elinor K Karlsson #  4   7   57
Collaborators, Affiliations

Evolutionary constraint and innovation across hundreds of placental mammals

Matthew J Christmas et al. Science. .

Abstract

Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. New placental mammal phylogeny supports the long-fuse model of diversification.
(A) Most interordinal diversification occurred in the Cretaceous, coincident with continental fragmentation and sea level changes. A pulse of intraordinal diversification occurred after the mass extinction event at the Cretaceous-Paleogene (K-Pg) boundary. Green, orange, and yellow shading bounded by gray lines demarcates different time periods. (B) A phylogeny based on divergence times estimated using ~470 kb of near-neutrally evolving sequence for 240 species resolves recalcitrant relationships in the placental mammal phylogeny (black numbers in white circles), including (1) Euarchonta (primates, colugos, and treeshrews), (2) Scrotifera [Perissodactyla (odd-toed ungulates), Cetartiodactyla (terrestrial even-toed ungulates and cetaceans), carnivorans, and bats], (3) Fereuungulata (perissodactyls, cetartiodactyls, carnivorans, pangolins), and (4) Zoomata [perissodactyls and Ferae (carnivorans and pangolins)]. [Species silhouettes are from PhyloPic]
Fig 2.
Fig 2.. Comparing 240 species resolves mammalian constraint to single bases and identifies elements under selection.
(A and B) We estimated a lower-bound on the total amount of the genome under constraint (A) and the number of single bases constrained at different FDR thresholds (B). The red lines in (B) indicate the 5% FDR threshold, with the amount of sequence below this threshold given. (C and D) Comparing the number of species with poor alignments (x axis) with those with good alignments (y axis) at 924,641 human candidate cis-regulatory elements (14) (C) reveals three clusters that are nonrandomly distributed across element types (all chi-square test p < 2.2 × 10−308) (D). (E) Functional elements are enriched for constraint, with candidate cis-regulatory elements in blue and other element types in black. The dashed line indicates no enrichment. DHS, DNase hypersensitivity site; 3′UTR, 3′ untranslated region; 5′UTR, 5′ untranslated region. (F) Constraint is negatively correlated with degeneracy across 59,504,353 protein-coding positions. (G) Methionine codons functioning as start sites in protein-coding sequence are more constrained at each of the three codon positions. (H) Cysteines in disulfide bridges are more constrained than other cysteines. In (F) to (H), the box boundaries represent 25 and 75% quartiles, with a horizontal line at the median and the vertical line demarcating an additional 1.5 times interquartile range (IQR) above and below the box boundaries. *** pWilcoxon< 1 × 10−16. (I) Most zooUCEs are new and do not overlap ultraconserved elements in the original set (73). (J) All zooUCEs are shorter than the original ultraconserved elements. Box and whisker parameters are the same as in (F), with outlier zooUCEs (>1.5 times IQR below or above the box boundaries) plotted as open circles. (K) Human variants in zooUCEs (light orange) have lower minor allele frequencies than they do in exons or genome-wide (gray). The vertical lines are at the means. The filled area is the distribution of allele frequencies. (L) Constraint measured in 100-kb bins genome-wide. The most constrained 100-kb bins include the HOX clusters (red). HOXD (red star) overlaps the longest synteny block shared across mammals (174). Rearrangements in this locus can lead to limb malformations and other damaging outcomes. One bin containing MUC16 (purple diamond) significantly lacks constraint. MUC16 provides a mucosal barrier that protects epithelial cells from pathogens. The red dashed line indicates q = 0.05. Labeled bins have q < 0.006.
Fig. 3.
Fig. 3.. Conserved function of constrained transcription factor binding sites.
(A) A two-component Gaussian mixture model fit over average phyloP scores across binding sites for CTCF distinguishes the distribution for evolutionarily constrained sites (red) from others (gray). (B) At CTCF binding sites, aggregate phyloP scores are high for constrained binding sites (red, 61,832 sites) but not for unconstrained binding sites (gray, 424,177 sites). The same pattern is observed for other transcription factors (fig. S10). (C) Across all transcription factors, aggregate phyloP scores are more strongly correlated (Pearson’s correlation) with binding site information content for constrained sites than for unconstrained sites. Boxes and whiskers represent 25% quartile, 75% quartile, minimum, and maximum, with a horizontal line at the median. The shading indicates the density of the data. (D) CTCF logos of constrained and unconstrained sets for four species made by lifting over human transcription factor binding sites. (E) Fraction of constrained (red) and unconstrained (gray) CTCF binding sites that are shared between pairs of species. (F) CTCF transcription factor chromatin immunoprecipitation sequencing (ChiP-seq) signal over binding sites in mammalian livers sorted by average phyloP scores. Each row is a binding site; in nonhuman species, only aligned sites are shown. The horizontal lines indicate significant constraint. Ranges give the minimum and maximum ChIP-seq fold change over input for each species. (G) Percentage of primate-specific and non–primate-specific transcription factor binding sites that are derived from individual transposable element classes. LINE, long interspersed nuclear element; LTR, long terminal repeat; MIR, mammalian-wide interspersed repeat; SINE, short interspersed nuclear element. [Species silhouettes are from PhyloPic]
Fig. 4.
Fig. 4.. Constraint highlights unannotated regions that are likely functional.
(A) Example UNICORNs on human chromosome 16. The largest is 418 bp and located 3.5 kb upstream of the transcription start site of the gene PMFBP1; the second largest is 174 bp. Gray dots represent single bases. Red dashed lines represent the FDR < 5% threshold for phyloP and the threshold for phastCons that captures equivalent genome proportion (phastCons base score ≥ 0.961). UNICORNs lack coding or regulatory annotations in ENCODE (top track), and most have low diversity in human populations (second track). (B) UNICORNs contain fewer variants, and those present have lower allele frequencies than those in the random set (Wilcoxon rank sum test, p < 2.2 × 10−16). The fraction of bases with single-nucleotide polymorphisms (SNPs) versus mean minor allele frequency for human SNPs within UNICORNs (left) or within a random set of unannotated sequences (right) is shown. Allele frequencies were log10 transformed. Human variants and allele frequencies were obtained from TOPMed data freeze 8 (69).
Fig. 5.
Fig. 5.. Associating coding and regulatory change with species phenotypes.
(A) Olfactory receptor gene count (x axis) is associated with the number of olfactory turbinals (y axis) in 64 species. Labels and silhouettes mark outliers and species of interest. (B) Testing the coding sequence of 16,209 genes identified 341 genes that are evolving faster or slower in hibernators (pFDR < 0.05; gray open circles), and 22 are significant after phylogeny-aware permutation testing (permutation pFDR < 0.05; labeled), including 11 evolving faster (red filled circles) and 11 evolving slower (blue filled circles). (C) TACIT first trains a predictive classifier on sequences that underlie open chromatin regions from tissues or cell types in a few species and then predicts open chromatin in many others and tests for phenotype associations. (D) TACIT associated a motor cortex open chromatin region with brain size (a continuous value trait), driven by associations within Laurasiatheria (59 species) and Euarchonta (36 species) but not within Glires (33 species). Results are for a rhesus macaque open chromatin region (chr10:48660711-48661679) near MACROD2. The phylolm line of best fit is shown for all species [solid line; phylolm coefficient (slope) = 0.45, permutation pFDR = 0.11] and, as a visual aid, for each clade (dashed line). Triangles represent cetaceans (highest variation in brain size residual), squares represent great apes (highest variation in brain size residual within Euarchonta), and circles represent other species. (E) TACIT associated a motor cortex open chromatin region with vocal learning (a binary trait) in the GALC locus (phylolm coefficient = 6.51, permutation pFDR = 0.045) (137). Results are for an Egyptian fruit bat open chromatin region (PVIL01002568.1:139004-139596). [Species silhouettes are from PhyloPic]
Fig. 6.
Fig. 6.. Genomic metrics distinguish at-risk primate species.
Primates that are categorized at increasing levels of extinction risk and with smaller effective population sizes have fewer substitutions at extremely constrained sites,measured as kurtosis (which describes the tail of the distribution) of phyloP scores (phylolm p =7.9 × 10−4 and p = 0.024, respectively). Four at-risk species with the smallest effective population size (labeled with silhouettes) have low kurtosis (i.e., fewer phyloP outliers), and a species categorized as “least concern” with the largest effective population size has high kurtosis (gray mouse lemur; labeled). [Species silhouettes are from PhyloPic]

References

    1. Burgin CJ, Colella JP, Kahn PL, Upham NS, How many species of mammals are there? J. Mammal. 99, 1–14 (2018). doi: 10.1093/jmammal/gyx147 - DOI
    1. Jones KE, Safi K, Ecology and evolution of mammalian biodiversity. Philos. Trans. R. Soc. London Ser. B 366, 2451–2461 (2011). doi: 10.1098/rstb.2011.0090 - DOI - PMC - PubMed
    1. Jones KE et al. , PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090-184. Ecology 90, 2648–2648 (2009). doi: 10.1890/08-1494.1 - DOI
    1. Zoonomia Consortium A comparative genomics multitool for scientific discovery and conservation. Nature 587, 240–245 (2020). doi: 10.1038/s41586-020-2876-6 - DOI - PMC - PubMed
    1. University of California Santa Cruz Genomics Institute, Conservation track settings; http://genome.ucsc.edu/cgibin/hgTrackUi?g=cons100way.