Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 13:6:5903.
doi: 10.1038/ncomms6903.

Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression

Affiliations

Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression

Dmitri D Pervouchine et al. Nat Commun. .

Abstract

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Examples of extensions to the mouse annotation (left: human; right: mouse).
(a) The novel intergenic transcript model CUFF.8454.1 has been inferred from mouse RNA-seq data in a region of the mouse genome without gene annotations. Mapping of this transcript model to the human genome reveals that it is the homologue of the lncRNA RP11-739N20.1 annotated in the human genome. (b) Two mouse transcript models, CUFF.721.1 and CUFF.2195.1, have predicted antisense to two neighbouring protein-coding loci (Npas2 and RPL31). RNA-seq shows that their expression is restricted to testis. CUFF.721.1 is the likely homologue of the annotated human ACO16738.4, which is antisense and overlaps the human orthologue of Npas2 and RPL31. In human, ACO16738.4 appears to be also expressed specifically in testis, although this is not conclusive, as we lack stranded RNA-seq data. (c) The mapping of the human AC0703046.25 lncRNA to the mouse genome identified a potential mouse transcript in the syntenic region between the DGUOK and the TET3 genes (piper_mm9_AC073046.25). This transcript was not included in our predicted models, but it has strong support by RNA-seq data, specifically in cerebellum, kidney and liver. Tissue RNA-seq data in human are from Ilumina Body Map HBM. Plots are UCSC browser screenshots where novel mouse models are indicated in black, human gencode annotation in blue, green and red, and mouse CSHL and human HBM RNA-seq signal in different colours at the bottom. Annotated genes are represented by the longest transcript.
Figure 2
Figure 2. Genome-wide conservation of expression profiles.
(a) The joint distribution of log10 average (AVG) read density in orthologous 100-nt bins in human (x-axis) and in mouse (y-axis); cc=0.67. (b) The distribution in a limited to intergenic regions; cc=0.37. (c) The distribution of log10 average read density in intergenic regions (average between human and mouse) as a function of the distance from the nearest gene. (d) Correlation of log10 average read density in human and mouse as a function of distance from the nearest gene. (e) The distribution of log10 average read density in 100-nt bins as a function of phastCons score (conservation score across 45 vertebrate species). (f) The distribution of phastCons score of a 100-nt bin as a function of the average read density.
Figure 3
Figure 3. Genome-wide conservation of antisense expression and splicing profiles.
(a) The joint distribution of the average antisense-to-total expression ratio (the number of reads mapped to the opposite strand as a fraction of the number of reads mapped to both strands) in pairs of orthologous protein-coding genes; cc=0.68. (b,c) Contour plots of the joint probability distribution of the average usage (ψ, per cent-spliced-in) of splice junctions (SJ; b), and of the standard deviation of SJ usage (c) in orthologous SJ pairs. Logistic transformation (logit) was used in a and b. SJ with constant complete inclusion or exclusion are not shown. ‘Alternative’ denotes SJ that are annotated alternative in both species.
Figure 4
Figure 4. Genes with constrained expression.
(a) The distribution of the dynamic range (DNR, log10 of the ratio of the largest and the lowest non-zero observation) of gene expression level in orthologous genes across human and mouse samples. (b) Venn diagram of the relationship between orthologous and constrained genes. (c) Proportion of nucleotides in expressed genes, as assessed by PolyA+ RNA-seq, that originates from constrained genes in human cell lines and mouse tissues. The labelled outliers correspond to mouse embryonic samples. (d,e) The distribution of DNR in human/mouse constrained and unconstrained genes in Merkin et al. (d) and Barbosa-Morais et al. (e). (f) The joint distribution of log10 average gene reads per kilobase per million mapped reads (RPKM) in pairs of orthologous protein-coding genes; constrained genes are shown in red. (g) The distribution of promoter, transcript and protein pairwise sequence identity between human and mouse in constrained and unconstrained genes.
Figure 5
Figure 5. Conservation of epigenetic marks in constrained genes.
(a) Normalized histone modification profiles 1 kb around transcription start site (TSS) for H3K4me3 and H3K27ac, and around the TTS for H3K36me3 in constrained and unconstrained genes. (b) The difference in normalized average histone modification signals 1 kb around TSS (|Δlog Signal|) in constrained and unconstrained genes. (c) The cumulative distribution of the number of TF ChIP-seq peaks in promoter regions of human constrained and unconstrained genes. (d) Principal component analysis of ChIP-seq measured binding strength of TFs in constrained and unconstrained genes.
Figure 6
Figure 6. Other properties of constrained genes.
(a) Contour plots of the joint probability distribution of the average versus standard deviation (s.d.) of log10 nuclear-to-cytosolic ratio in constrained and unconstrained genes. (b) Distribution of the proportion of the variance in transcript abundance across human and mouse samples that can be explained by the overall variance in gene expression. Values close to 1 indicate that changes in abundance of transcript isoforms originate mostly from changes in gene expression, values close to 0 suggest that most of these changes are due to splicing changes in the relative proportion of transcript isoforms. (c) Mean versus variance of splice junction inclusion (ψ) in constrained and unconstrained genes. To identify junctions with constrained inclusion at intermediate levels, we set a threshold (inner parabola) of 20% of the maximum variance for the given mean in a Bernoulli distribution (outer parabola) in the interval of mean inclusion (0.15, 0.85). (d) The percent of all and of the 1,000 most constrained and unconstrained genes (right) that are lethal in mice according to the Jax mice embryonic lethality database, have hits in OMIM, the NHGRI GWAS catalogue, COSMIC and that have eQTLs. On the left, the number of significant traits/diseases associated to constrained and unconstrained genes in OMIM, the NHGRI GWAS catalogue and COSMIC.

References

    1. Waterston R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002). - PubMed
    1. Zheng-Bradley X., Rung J., Parkinson H. & Brazma A. Large scale comparison of global gene expression patterns in human and mouse. Genome Biol. 11, R124 (2010). - PMC - PubMed
    1. The-mouse-ENCODE-consortium. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014). - PMC - PubMed
    1. Carninci P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005). - PubMed
    1. Katayama S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005). - PubMed

Publication types

Associated data