Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 18;35(3):475-488.
doi: 10.1101/gr.278931.124.

The role of transposon activity in shaping cis-regulatory element evolution after whole-genome duplication

Affiliations

The role of transposon activity in shaping cis-regulatory element evolution after whole-genome duplication

Øystein Monsen et al. Genome Res. .

Abstract

Whole-genome duplications (WGDs) and transposable element (TE) activity can act synergistically in genome evolution. WGDs can increase TE activity directly through cellular stress or indirectly by relaxing selection against TE insertions in functionally redundant, duplicated regions. Because TEs can function as, or evolve into, TE-derived cis-regulatory elements (TE-CREs), bursts of TE activity following WGD are therefore likely to impact evolution of gene regulation. Yet, the role of TEs in genome regulatory evolution after WGDs is not well understood. Here we used Atlantic salmon as a model system to explore how TE activity after the salmonid WGD ∼100 MYA shaped CRE evolution. We identified 55,080 putative TE-CREs using chromatin accessibility data from the liver and brain. Retroelements were both the dominant source of TE-CREs and had higher regulatory activity in MPRA experiments compared with DNA elements. A minority of TE subfamilies (16%) accounted for 46% of TE-CREs, but these "CRE superspreaders" were mostly active prior to the WGD. Analysis of individual TE insertions, however, revealed enrichment of TE-CREs originating from WGD-associated TE activity, particularly for the DTT (Tc1-Mariner) DNA elements. Furthermore, coexpression analyses supported the presence of TE-driven gene regulatory network evolution, including DTT elements active at the time of WGD. In conclusion, our study supports a scenario in which TE activity has been important in genome regulatory evolution, either through relaxed selective constraints or through strong selection to recalibrate optimal gene expression phenotypes, during a transient period following genome doubling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the genomic TE landscape. (A) Superfamily level overview of TE annotations in the Atlantic salmon genome. Number of TE subfamilies per superfamily in square brackets. (B) TE insertions per superfamily. (C) Annotated base pairs at the TE superfamily level. (D) TE annotations (base pair proportions) overlapping different genomic contexts. Genomic baseline is the proportion of the entire genomic sequence that is assigned to the four genomic contexts.
Figure 2.
Figure 2.
TE-CRE landscape. (A) The proportion of base pairs overlapping TEs, either from out of all genome-wide base pairs or from those within an ATAC-seq peak. (B–D) Pipeline to define putative TE-CREs. (B) Venn diagram of tissue-specific and shared ATAC-peaks from the liver and brain. (C) Cartoon showing how TE-CREs are defined as ATAC-seq peak summits when overlapping with a TE. (D) Venn diagram of tissue-specific and shared TE-CREs from liver and brain. (E) Proportion of shared and tissue-specific TE-CREs in promoter versus intergenic regions. (F) Gene expression levels of the nearest genes to tissue-specific and shared TE-CREs in the brain and liver. P-values from a Wilcoxon test are indicated above tissues. (G) Correlation between TF tissue specificity and the proportion of genome wide TF motif matches located in TEs. Each point represents a TF motif. Tissue specificity is based on differential TF binding score from TOBIAS (Bentsen et al. 2020), which essentially summarizes the relative ATAC-seq footprint signal across all potential binding sites.
Figure 3.
Figure 3.
TEs enriched in open chromatin. (A) The number of insertions per superfamily plotted against the number of CREs in each superfamily. The shaded area is a 95% confidence-level interval. Superfamilies falling outside the 95% confidence interval are annotated with the three-letter superfamily code. (B) TE families (more than 500 genomic insertions) plotted according to fold enrichment within ATAC-seq peaks in the brain and liver. TE subfamilies are assigned into categories based on enrichment in liver, brain, or both. (C) Proportion of TE subfamilies enriched in open chromatin per superfamily. A manual curation step of the TE subfamilies enriched in open chromatin resulted in a slightly different superfamily list than the initial machine-predicted annotations presented in Figure 1. Note also that only TE subfamily sequences with more than 500 insertions have been included. The percentage of enriched TE subfamilies per superfamily are indicated above bars. (DG) Proportion of base pair overlapping TEs from each enrichment category around peak summits in intergenic or promoter regions (summit within 500 bases of a TSS). Peaks in promoter regions are oriented according to the corresponding TSS with gene bodies to the right in figures.
Figure 4.
Figure 4.
Temporal dynamics of TE-CRE insertion activity by TE taxonomy. (A) Distribution of sequence divergence of TE-CREs from their TE-subfamily consensus sequence. Colors represent if TE-CREs are from TE subfamilies with superspreader ability (liver, brain, or both) or not (gray). (B) Number of TE subfamilies with superspreader ability subdivided into DNA elements and retroelements. Colors represent the TE-subfamily age proxy calculated as mean divergence between genomic insertions and their consensus TE sequence. (Post-WGD) <7 Kimura distance, (WGD) 7–10 Kimura distance, (pre-WGD) >10 Kimura distance. (C) Number of TE-CREs from TEs with a taxonomic classification subdivided into DNA elements and retroelements. Colors represent the TE-subfamily age proxy. (D) Heatmap of the divergence distributions of all insertions per TE subfamily (with more than 500 insertions) to their consensus sequence. TE families are ordered based on mean divergence from consensus. (E) Cumulative distribution of CRE-superspreader TE families ordered by mean Kimura distance between genomic copies and TE-subfamily consensus sequence. Colors represent age proxy as defined by mean divergence to TE-subfamily consensus sequence (FI) The number of TE-CREs (FH) and TE insertions (I) per “age”-bin of Kimura distances for all TE-CREs, TE-CREs from superspreader families, and TE-CREs from the DTT superfamily.
Figure 5.
Figure 5.
TE-CREs driving coexpression. (AC) Results from liver coexpression. (DF) Results from tissue atlas coexpression. (A,D) Significance (FDR-adjusted P-values) plotted against effect size (standard deviations) for each TE subfamily, indicating the strength of coexpression of their associated genes in the liver (A) and tissue atlas (D) coexpression networks, respectively. Points with FDR-adjusted P-value < 0.05 are colored by Kimura distance to TE-subfamily consensus. (B,E) Distribution of significant TE subfamilies grouped by superfamilies in liver (B) and tissue atlas (E) data sets. (C,F) Cumulative distribution of TE subfamilies with significant effect on gene coexpression in liver (C) and tissue atlas (F) data sets. Temporal classification was based on the mean divergence of all TE insertions to their TE subfamily consensus sequence, for which post-WGD was defined as Kimura distance < 7, WGD as 7–10, and pre-WGD as >10.
Figure 6.
Figure 6.
Massive parallel reporter assay screening of regulatory activity. (A) Schematic overview of the ATAC-STARR-seq MPRA experiment. (B) Barplot of the origin of sequence fragments included in the analyses. (C) Regulatory activity (inducer or repressor) of MPRA sequence fragments from TE and non-TE sequences. (D) Fisher's exact test results for enrichment of transcriptional-inducing MPRA fragments within a TE superfamily compared with all other TEs. Unknown taxonomy and DNA/retrotransposons of unknown origin (DTX/RLX) are considered separate groups. A similar test is also done on the subfamily level, and the number of significant TE subfamilies and total number of subfamilies tested are given in square brackets next to the superfamily codes. Number of regulatory active fragments are given for each category (n). (EG) TF motif enrichment in transcriptionally inducing MPRA fragments from TE superfamilies enriched in regulatory active fragments. TF names are from the JASPAR database, and the nomenclature reflects whether it came from human or mouse. (H) Fisher's exact test results for enrichment of transcriptional-repressing MPRA fragments within a TE superfamily compared with all other TEs. A similar test is also done on the subfamily level, and the ratio of number of significant TE subfamilies to total number of subfamilies tested is given in square brackets next to the superfamily codes. Unknown taxonomy and DNA/retrotransposons of unknown origin (DTX/RLX) are considered separate groups. Number of regulatory active fragments are given for each category (n).

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215: 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Andrews G, Fan K, Pratt HE, Phalke N, Zoonomia Consortium§, Karlsson EK, Lindblad-Toh K, Gazal S, Moore JE, Weng Z, et al. 2023. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science 380: eabn7930. 10.1126/science.abn7930 - DOI - PubMed
    1. Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A. 2013. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339: 1074–1077. 10.1126/science.1232542 - DOI - PubMed
    1. Baduel P, Quadrana L, Hunter B, Bomblies K, Colot V. 2019. Relaxed purifying selection in autopolyploids drives transposable element over-accumulation which provides variants for local adaptation. Nat Commun 10: 5818. 10.1038/s41467-019-13730-0 - DOI - PMC - PubMed
    1. Bao W, Kojima KK, Kohany O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6: 11. 10.1186/s13100-015-0041-9 - DOI - PMC - PubMed

LinkOut - more resources