Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec;30(12):1815-1834.
doi: 10.1101/gr.260844.120. Epub 2020 Jul 30.

Cross-species analysis of enhancer logic using deep learning

Affiliations

Cross-species analysis of enhancer logic using deep learning

Liesbeth Minnoye et al. Genome Res. 2020 Dec.

Abstract

Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type-specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4 Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparative epigenomics reveals conservation of two main melanoma states. (A) Evolutionary relationship between the six studied species, represented by a phylogenetic tree (NCBI taxonomy tree). ATAC-seq profiles of the 26 melanoma cell lines are shown for three regulatory regions. (B) ATAC-seq profiles of the human melanoma lines for the SOX10 locus. Lines are colored by the melanocytic (MEL, in blue) or mesenchymal-like (MES, in orange) melanoma state. (C) Total number of ATAC-seq regions observed across all samples of a species are colored based on whether they are not alignable, alignable, or conserved accessible in human. (D) PCA clustering based on the accessibility of the 29,619 alignable regions across all six species. (E) ATAC-seq profiles of MEL and MES lines of different species for an intronic MLANA enhancer and the upstream region of MMP3.
Figure 2.
Figure 2.
Conservation of binding motifs of master regulators of MEL and MES melanoma states. (A,B) Heatmap of differential ATAC-seq regions when comparing human MEL versus human MES lines (A) and the MEL dog line “Dog-OralMel-18249” versus the MES dog line “Dog-IrisMel-14205” (two biological replicates each) (B), colored by normalized ATAC-seq signal. Enriched TF binding motifs in the differential peaks were identified via HOMER (Heinz et al. 2010), and the first logo of enriched TF families is shown. The ratio of the percentage of target and background sequences with the motif is indicated between brackets, as well as the rank of the TF class within the HOMER output (#). (C) Schematic overview of cross-species motif analysis using the branch length score (BLS) as a measure for the evolutionary conservation of a motif hit across conserved accessible regions. The BLS was summed across a set of conserved accessible regions. (D,E) Histogram of the normalized summed BLS score for 20,003 motifs on 9732 conserved accessible regions across the mammalian MEL lines (D) and on 113 conserved accessible regions across MEL lines of all six species (E). The first hit of the top recurrent TF binding motifs within the top 4% conserved motifs is indicated as a cross and is accompanied by the logo of the motif.
Figure 3.
Figure 3.
DeepMEL classifies melanoma enhancers and predicts important TF binding motifs. (A) Cell-topic heatmap of cisTopic applied to 339,099 ATAC-seq regions across the 16 human melanoma lines, colored by normalized topic scores. (029*) MM029_R2. (B) Example regions of a MEL-specific (topic 4) region near MIA and MES-specific (topic 7) regions upstream of SERPINE1. (C) Schematic overview of DeepMEL. Twenty-four topics or sets of coaccessible regions were used as input for training of a multiclass multilabel neural network. (D,E) Receiver operating characteristic curve (D) and precision recall curve (E) for DeepMEL on training, test, and shuffled data of topic 4 and topic 7 regions. (F) Top enriched filters learned by DeepMEL to classify regions as MEL (topic 4) or MES (topic 7). Normalized filter importance is shown per filter. (G) Example of a MEL-predicted enhancer near IRF4. (First and second rows) DeepExplainer view of the forward and reverse strand, with the height of the nucleotides indicating the importance for prediction of the MEL enhancer. (Third row) In vitro effect of point mutations on enhancer activity as measured by MPRA (Kircher et al. 2019). Colors represent the nucleotide to which the wild-type nucleotide is mutated. (Fourth row) In silico effect of point mutations as predicted by DeepMEL. (H) Correlation between the in vitro mutational effects on the IRF4 enhancer and the in silico mutagenesis predictions. (I) Performance of variant effect prediction of DeepMEL using topics (black bar, model used in this paper) or using ATAC-seq signal (white bar), and several previously tested models on the IRF4 enhancer case (Kircher et al. 2019).
Figure 4.
Figure 4.
Human-trained deep learning model applied to cross-species ATAC-seq data. (A) Performance of DeepMEL and Cluster-Buster (cbust) in classifying MEL and MES differential peaks in human and dog. (B) Percentage of MEL- and MES-predicted ATAC-seq regions across all samples in our cohort and in human melanocytes. Samples are ordered according to the ratio of the number of MES/MEL-predicted regions. (C) Pearson's correlation of deep layer scores between MEL-predicted regions near orthologous MEL genes between human and another species (Human-Species) or between MEL-predicted regions near different MEL genes within one species (Species-Species). P-values of unpaired two-sample Wilcoxon tests are reported. (D) (I) Evolutionary distance between human and other species in branch length units. (II) ATAC-seq profiles of the ERBB3 locus in the six species. MEL-specific enhancers that were predicted by DeepMEL and that were also found (gray) or not found (green) via liftOver of the human MEL enhancer are highlighted. (III) DeepExplainer plots for the multiple-aligned MEL-predicted ERBB3 enhancers. Red and blue dots represent point and indel mutations, respectively.
Figure 5.
Figure 5.
Core Regulatory Complex of MEL melanoma enhancers. (A) Schematic overview of motif scoring method in which extended convolutional filter hits from DeepMEL are multiplied by DeepExplainer profiles to yield significant motif hits. (B,C) Heatmap (B) and binarized heatmap (C) of the number of significant SOX, TFAP2A, MITF, and RUNX-like motif hits on the 3885 MEL-predicted regions in the human cell line MM001. (D) Aggregation plot of normalized ChIP-seq signal of SOX10, MITF, and TFAP2A on the human enhancer clusters. (E,F) Venn diagram of regions clusters on the 3885 MEL-predicted regions in human (in MM001) (E) and the 4194 MEL-predicted regions in dog (in Dog-OralMel-18249) (F). Example MEL-predicted enhancers in human and dog are shown for two of the region clusters. The ATAC-seq signal of the regions is shown in gray.
Figure 6.
Figure 6.
Positional specificity of SOX10 and TFAP2A in MEL melanoma enhancers. (A,B, top) Example human (A) and dog (B) MEL-predicted enhancer containing significant SOX10 and TFAP2A motifs. The ATAC-seq signal is shown in gray. (A, middle; B, bottom) Imputed nucleosome start and middle point profiles. (A, bottom) For the human example region, ATAC-seq profiles of MM001 in control condition, after 72 h of SOX10 knockdown or TFAP2A knockdown are shown. (C) Schematic overview of the nucleosome structure explaining the colors used in D and E. (D,E) Nucleosome start point (D) and nucleosome middle point predictions (E) on MEL-predicted regions containing one SOX10 (left) or one TFAP2A motif (right) next to possible other motifs, where the regions are either centered on the ATAC-seq summit (gray) or on the SOX10 or TFAP2A motif (blue).
Figure 7.
Figure 7.
Predicting causal mutations of evolutionary changes in MEL enhancers. (A,B) Example region upstream of APPL2 that is accessible (A) and active (B) in the MEL dog line Dog-OralMel-18249 but not in human MEL lines. (C) DeepMEL prediction score of each of the 24 topics for the dog and human APPL2 enhancer. (D) Effect on topic 4 DeepMEL score on the dog sequence when in silico simulating each of the single detected point mutations between the dog and human APPL2 enhancer. (E) DeepExplainer plots of the middle 120 bp of the dog and human APPL2 enhancer. In the middle, the effect of each possible point mutation between the dog and human sequence on the MEL DeepMEL score was in silico calculated and is represented by colored dots depending on the nucleotide to which the original dog nucleotide was in silico mutated. Truly existing point mutations between the dog and human sequence are highlighted by color-coded vertical dashed lines. Four mutations that decrease the motif score of the SOX10, MITF, and TFAP2A motifs are highlighted by a gray box and are encircled. (F) Bar plot showing the mean effect on the log2 delta ATAC-seq signal of a non-human region compared to the human homolog depending on the number of SOX10 motif hits lost or gained. Only regions having no change in the number of significant TFAP2A, MITF, and RUNX motifs hits were used. The y-axis is normalized to the category with no changes in the number of significant SOX10 motif hits. The number of regions in each of the categories is mentioned (#). (G) Luciferase assay on six human or dog enhancers. Significant motif hits per enhancer are shown with colored crosses. For the luciferase assays: luciferase activity in MM001 is shown relative to Renilla signal and is log10 transformed. P-values were determined using Student's t-test, and the error bars represent the standard deviation over three biological replicates.

References

    1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. 2016. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 [cs.DC].
    1. Alipanahi B, Delong A, Weirauch MT, Frey BJ. 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33: 831–838. 10.1038/nbt.3300 - DOI - PubMed
    1. Angermueller C, Lee HJ, Reik W, Stegle O. 2017. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 18: 67 10.1186/s13059-017-1189-z - DOI - PMC - PubMed
    1. Arendt D, Musser JM, Baker CVH, Bergman A, Cepko C, Erwin DH, Pavlicev M, Schlosser G, Widder S, Laubichler MD, et al. 2016. The origin and evolution of cell types. Nat Rev Genet 17: 744–757. 10.1038/nrg.2016.127 - DOI - PubMed
    1. Arunachalam M, Jayasurya K, Tomancak P, Ohler U. 2010. An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes. Bioinformatics 26: 2109–2115. 10.1093/bioinformatics/btq358 - DOI - PMC - PubMed

Publication types