Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 1;10(2):538-552.
doi: 10.1093/gbe/evy016.

Pervasive Correlated Evolution in Gene Expression Shapes Cell and Tissue Type Transcriptomes

Affiliations

Pervasive Correlated Evolution in Gene Expression Shapes Cell and Tissue Type Transcriptomes

Cong Liang et al. Genome Biol Evol. .

Abstract

The evolution and diversification of cell types is a key means by which animal complexity evolves. Recently, hierarchical clustering and phylogenetic methods have been applied to RNA-seq data to infer cell type evolutionary history and homology. A major challenge for interpreting this data is that cell type transcriptomes may not evolve independently due to correlated changes in gene expression. This nonindependence can arise for several reasons, such as common regulatory sequences for genes expressed in multiple tissues, that is, pleiotropic effects of mutations. We develop a model to estimate the level of correlated transcriptome evolution (LCE) and apply it to different data sets. The results reveal pervasive correlated transcriptome evolution among different cell and tissue types. In general, tissues related by morphology or developmental lineage exhibit higher LCE than more distantly related tissues. Analyzing new data collected from bird skin appendages suggests that LCE decreases with the phylogenetic age of tissues compared, with recently evolved tissues exhibiting the highest LCE. Furthermore, we show correlated evolution can alter patterns of hierarchical clustering, causing different tissue types from the same species to cluster together. To identify genes that most strongly contribute to the correlated evolution signal, we performed a gene-wise estimation of LCE on a data set with ten species. Removing genes with high LCE allows for accurate reconstruction of evolutionary relationships among tissue types. Our study provides a statistical method to measure and account for correlated gene expression evolution when interpreting comparative transcriptome data.

Keywords: cell type evolution; comparative transcriptomincs; correlated evolution; gene expression evolution.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.—
Fig. 1.—
Correlated evolution of cell transcriptomes. (A) Cell type history embedded within species history. Cell types A and B originate from ancestral cell type O prior to split of species 1 and 2. Cell type lineages show homology relationships for the two cell types in the descendant species (subscripts indicate species). (B) Cell type history without species phylogeny illustrates cell type homology. Gray squiggly lines indicate nonindependence of transcriptome change. Correlated transcriptome evolution leads to an increase of transcriptome similarities in cell types of the same species relative to their homologous counterparts in other species. (C) Correlated evolution causes the accumulation of species-specific gene expression similarities between tissues in the same organism, potentially resulting in different tissues within a species being more similar to each other than to their homologous counterparts in another species.
<sc>Fig</sc>. 2.—
Fig. 2.—
Estimation of LCE in simulated transcriptome data. We simulated 1, 000 transcriptome evolutionary trajectories with varied distributions of stabilizing force λ and LCE paratmer γ. The distributions of λ are: λ=0 (Brownian model), a gamma distribution (Ornstein–Uhlenbeck model), and a mixture distribution of zeros and gamma distribution (mixture of Brownian and OU model). The distributions of γ are: a beta distribution, and a mixture distribution of beta distribution and zeros. In all conditions, we observed a high correlation between the estimated LCE (γ^) and the mean of true γ among genes (R2>0.95). Blue and orange colors are simulations with highly (lung–spleen) and lowly (brain–testes) correlated gene expression optima, respectively. We find that estimation of LCE are not influenced by the correlation between gene expression optima or the initial state of the stochastic process. We used square root transformed TPMs from Merkin data as gene expression optima and the initial states in the simulation. A total of 600 data points are randomly sampled to be plotted in this figure.
<sc>Fig</sc>. 3.—
Fig. 3.—
Effect of normalization methods, species divergence time, and data set on estimates of LCE. (A) The time tree for species analyzed in Merkin data set. (B) The time tree for species analyzed in Brawand data set. All tissues considered here are much older than the species compared in these two data sets. (C) The scatterplot of estimated LCE using two normalization methods for Brawand data set: normalized RPKM according to most consistently expressed genes as in Brawand et al. (2011), and normalized TPM according to one-to-one orthologs as in Musser et al. (2015). The normalization methods of gene expression levels do not affect the estimation of LCE (R2 = 0.986). (D) Estimates of LCE (γ^) from brain and heart sampled in both Merkin and Brawand data sets using independent contrasts. The horizontal axis is the species divergence time in million years at internal nodes as shown in (A) and (B). The vertical axis is the estimated LCE, γ^. The estimated LCE (γ^) is not influenced by species divergence time (ANOVA P value > 0.1). The dashed lines indicate the mean value of γ^ with divergence time > 50 Ma (blue for Merkin data, orange for Brawand data). (E) Boxplot of estimated LCE for two shared tissue comparisons between Brawand and Merkin data set (see all shared tissue comparisons in supplementary fig. S1, Supplementary Material online). Estimates from two data sets do not statistically differ (Welch’s t-test, “-“: P > 0.05, “*”: P < 0.05; “**” P < 0.01). We conclude that our estimates of LCE are consistent if estimated from data sets produced by different laboratories.
<sc>Fig</sc>. 4.—
Fig. 4.—
Estimates of LCE in various tissue pairs. LCE estimates for Tschopp, Merkin, and Brawand data sets. Tschopp estimates (γ; points in red) are for early forelimb-, hindlimb-, genital-, and tail buds from mouse and Anolis. Merkin and Brawand estimates (γ-) are from 10 different mature tissues in 12 species of mammal and chicken. Estimates of LCE range from 0 to 1, with higher values indicating stronger correlated evolution. Dotted line indicates cutoff for estimates of γ significantly greater than expected by chance under a model without correlated evolution (see Materials and Methods). Gray colors identify LCE estimates with testes. Numbers identify LCE estimates for other tissues with highest and lowest LCE (points in blue): 1) cerebellum and forebrain, 2) spleen and lung, 3) spleen and colon, 4) colon and lung, 5) heart and skeletal muscle, 6) brain and liver (Brawand data set), 7) heart and liver, 8) cerebellum and liver, 9) brain and liver (Merkin data set), and 10) liver and skeletal muscle. Black data points are the remaining tissue comparisons from Merkin and Brawand data set (supplementary table S4, Supplementary Material online). Of note, brain and liver comparison were tested in both Merkin and Brawand data set, and LCE estimates from two data sets do not significantly differ from each other (t-test P value > 0.5).
<sc>Fig</sc>. 5.—
Fig. 5.—
Theoretical hierarchical clustering pattern varies with model parameters. (A) Brownian model. (B) Ornstein–Uhlenbeck model. The hierarchical clustering pattern of four cell types, A1, A2, B1, and B2, is determined by the correlation matrix of their transcriptome profiles (supplementary eq. S2, Supplementary Material online). Homology signal (horizontal axis; corr(A0,B0) or corr(μA, μB)) and correlated evolution signal (vertical axis; γ) shape cell type transcriptome similarities together. The phase transition condition between clustering by homology and clustering by species is a straight line (supplementary eqs. S4 and S5, Supplementary Material online). In both models, no clustering by species pattern is observed without correlated evolution (γ=0). αB and αOU are parameters that are related to the random walk variance, σ2, in Brownian and OU model, respectively (see supplementary methods, Supplementary Material online). The dashed arrow indicates how the phase transition boundary changes with parameters αB and αOU. If the random walk variance σ2 decreases, there is higher chance to see group-by-homology pattern.
<sc>Fig</sc>. 6.—
Fig. 6.—
Clustering by homology can be recovered by excluding genes with high LCE. (A) Hierarchical clustering of mouse and chicken lung and spleen illustrating examples of tissues clustering by species. Numbers at nodes are confidence values (%) calculated by the package using two different methods. (B) LCE versus clustering pattern of tetrads with samples from two species and two homologous tissue types. Higher LCE are associated with larger chance of clustering by species, as is the age of the lineage split. With higher LCE and long evolution, correlated evolution leads to the accumulation of species specific similarities among tissues and thus to clustering by species. (C) Upper: Bar-plot of the number of correlated (high LCE) genes identified in all tissue comparisons in Brawand data set. The criteria for highly correlated genes are: q value < 0.05 (BH correction) and γ^>0.5. The x-axis represents different tissue comparisons. Lower: Histogram of how many times each one-to-one orthologs are identified as correlated genes in all tissue pairs of Brawand data set. Only a minority of genes have high LCE in a majority of tissue pairs. (D) Heatmap and hierarchical clustering of forebrain and cerebellum samples from Brawand data using all one-to-one orthologs. Outside the primates samples cluster by species. (E) Samples cluster by tissue types when genes with high LCE are excluded in the hierarchical clustering analysis.

Similar articles

Cited by

References

    1. Achim K, et al.2015. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol. 33(5):503–509.http://dx.doi.org/10.1038/nbt.3209 - DOI - PubMed
    1. Ackerly DD. 2000. Taxon sampling, correlated evolution, and independent contrasts. Evolution 54(5):1480–1492.http://dx.doi.org/10.1111/j.0014-3820.2000.tb00694.x - DOI - PubMed
    1. Arendt D. 2008. The evolution of cell types in animals: emerging principles from molecular studies. Nat Rev Genet. 9(11):868–882.http://dx.doi.org/10.1038/nrg2416 - DOI - PubMed
    1. Arendt D, et al.2016. The origin and evolution of cell types. Nat Rev Genet. 17:744–757. - PubMed
    1. Bedford T, Hartl DL.. 2009. Optimization of gene expression by natural selection. Proc Natl Acad Sci U S A. 106(4):1133–1138. - PMC - PubMed

Publication types

LinkOut - more resources