. 2020 Dec;30(12):1815-1834.

doi: 10.1101/gr.260844.120. Epub 2020 Jul 30.

Cross-species analysis of enhancer logic using deep learning

Liesbeth Minnoye^#^{1

2}, Ibrahim Ihsan Taskiran^#^{1

2}, David Mauduit^{1

2}, Maurizio Fazio^{3

4}, Linde Van Aerschot^{1

2

5}, Gert Hulselmans^{1

2}, Valerie Christiaens^{1

2}, Samira Makhzami^{1

2}, Monika Seltenhammer^{6

7}, Panagiotis Karras^{8

9}, Aline Primot¹⁰, Edouard Cadieu¹⁰, Ellen van Rooijen^{3

4}, Jean-Christophe Marine^{8

9}, Giorgia Egidy¹¹, Ghanem-Elias Ghanem¹², Leonard Zon^{3

4}, Jasper Wouters^{1

2}, Stein Aerts^{1

2}

Affiliations

¹ VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.
² KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium.
³ Howard Hughes Medical Institute, Stem Cell Program and the Division of Pediatric Hematology/Oncology, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁴ Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Cambridge, Massachusetts 02138, USA.
⁵ Laboratory for Disease Mechanisms in Cancer, KU Leuven, 3000 Leuven, Belgium.
⁶ Center for Forensic Medicine, Medical University of Vienna, 1090 Vienna, Austria.
⁷ Division of Livestock Sciences (NUWI) - BOKU University of Natural Resources and Life Sciences, 1180 Vienna, Austria.
⁸ VIB-KU Leuven Center for Cancer Biology, 3000 Leuven, Belgium.
⁹ KU Leuven, Department of Oncology KU Leuven, 3000 Leuven, Belgium.
¹⁰ CNRS-University of Rennes 1, UMR6290, Institute of Genetics and Development of Rennes, Faculty of Medicine, 35000 Rennes, France.
¹¹ Université Paris-Saclay, INRA, AgroParisTech, GABI, 78350 Jouy-en-Josas, France.
¹² Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium.

^# Contributed equally.

PMID: 32732264
PMCID: PMC7706731
DOI: 10.1101/gr.260844.120

Cross-species analysis of enhancer logic using deep learning

Liesbeth Minnoye et al. Genome Res. 2020 Dec.

. 2020 Dec;30(12):1815-1834.

doi: 10.1101/gr.260844.120. Epub 2020 Jul 30.

Authors

Affiliations

¹ VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.
² KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium.
³ Howard Hughes Medical Institute, Stem Cell Program and the Division of Pediatric Hematology/Oncology, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁴ Department of Stem Cell and Regenerative Biology, Harvard Stem Cell Institute, Cambridge, Massachusetts 02138, USA.
⁵ Laboratory for Disease Mechanisms in Cancer, KU Leuven, 3000 Leuven, Belgium.
⁶ Center for Forensic Medicine, Medical University of Vienna, 1090 Vienna, Austria.
⁷ Division of Livestock Sciences (NUWI) - BOKU University of Natural Resources and Life Sciences, 1180 Vienna, Austria.
⁸ VIB-KU Leuven Center for Cancer Biology, 3000 Leuven, Belgium.
⁹ KU Leuven, Department of Oncology KU Leuven, 3000 Leuven, Belgium.
¹⁰ CNRS-University of Rennes 1, UMR6290, Institute of Genetics and Development of Rennes, Faculty of Medicine, 35000 Rennes, France.
¹¹ Université Paris-Saclay, INRA, AgroParisTech, GABI, 78350 Jouy-en-Josas, France.
¹² Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium.

^# Contributed equally.

PMID: 32732264
PMCID: PMC7706731
DOI: 10.1101/gr.260844.120

Abstract

Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type-specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4 Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.

PubMed Disclaimer

Figures

**Figure 1.**
Comparative epigenomics reveals conservation of two main melanoma states. (A) Evolutionary relationship between the six studied species, represented by a phylogenetic tree (NCBI taxonomy tree). ATAC-seq profiles of the 26 melanoma cell lines are shown for three regulatory regions. (B) ATAC-seq profiles of the human melanoma lines for the *SOX10* locus. Lines are colored by the melanocytic (MEL, in blue) or mesenchymal-like (MES, in orange) melanoma state. (C) Total number of ATAC-seq regions observed across all samples of a species are colored based on whether they are not alignable, alignable, or conserved accessible in human. (D) PCA clustering based on the accessibility of the 29,619 alignable regions across all six species. (E) ATAC-seq profiles of MEL and MES lines of different species for an intronic *MLANA* enhancer and the upstream region of *MMP3*.

**Figure 2.**
Conservation of binding motifs of master regulators of MEL and MES melanoma states. (A,B) Heatmap of differential ATAC-seq regions when comparing human MEL versus human MES lines (A) and the MEL dog line “Dog-OralMel-18249” versus the MES dog line “Dog-IrisMel-14205” (two biological replicates each) (B), colored by normalized ATAC-seq signal. Enriched TF binding motifs in the differential peaks were identified via HOMER (Heinz et al. 2010), and the first logo of enriched TF families is shown. The ratio of the percentage of target and background sequences with the motif is indicated between brackets, as well as the rank of the TF class within the HOMER output (#). (C) Schematic overview of cross-species motif analysis using the branch length score (BLS) as a measure for the evolutionary conservation of a motif hit across conserved accessible regions. The BLS was summed across a set of conserved accessible regions. (D,E) Histogram of the normalized summed BLS score for 20,003 motifs on 9732 conserved accessible regions across the mammalian MEL lines (D) and on 113 conserved accessible regions across MEL lines of all six species (E). The first hit of the top recurrent TF binding motifs within the top 4% conserved motifs is indicated as a cross and is accompanied by the logo of the motif.

**Figure 3.**
DeepMEL classifies melanoma enhancers and predicts important TF binding motifs. (A) Cell-topic heatmap of cisTopic applied to 339,099 ATAC-seq regions across the 16 human melanoma lines, colored by normalized topic scores. (029*) MM029_R2. (B) Example regions of a MEL-specific (topic 4) region near *MIA* and MES-specific (topic 7) regions upstream of *SERPINE1*. (C) Schematic overview of DeepMEL. Twenty-four topics or sets of coaccessible regions were used as input for training of a multiclass multilabel neural network. (*D,E*) Receiver operating characteristic curve (D) and precision recall curve (E) for DeepMEL on training, test, and shuffled data of topic 4 and topic 7 regions. (F) Top enriched filters learned by DeepMEL to classify regions as MEL (topic 4) or MES (topic 7). Normalized filter importance is shown per filter. (G) Example of a MEL-predicted enhancer near *IRF4*. (First and second rows) DeepExplainer view of the forward and reverse strand, with the height of the nucleotides indicating the importance for prediction of the MEL enhancer. (Third row) In vitro effect of point mutations on enhancer activity as measured by MPRA (Kircher et al. 2019). Colors represent the nucleotide to which the wild-type nucleotide is mutated. (Fourth row) In silico effect of point mutations as predicted by DeepMEL. (H) Correlation between the in vitro mutational effects on the *IRF4* enhancer and the in silico mutagenesis predictions. (I) Performance of variant effect prediction of DeepMEL using topics (black bar, model used in this paper) or using ATAC-seq signal (white bar), and several previously tested models on the *IRF4* enhancer case (Kircher et al. 2019).

**Figure 4.**
Human-trained deep learning model applied to cross-species ATAC-seq data. (A) Performance of DeepMEL and Cluster-Buster (cbust) in classifying MEL and MES differential peaks in human and dog. (B) Percentage of MEL- and MES-predicted ATAC-seq regions across all samples in our cohort and in human melanocytes. Samples are ordered according to the ratio of the number of MES/MEL-predicted regions. (C) Pearson's correlation of deep layer scores between MEL-predicted regions near orthologous MEL genes between human and another species (Human-Species) or between MEL-predicted regions near different MEL genes within one species (Species-Species). P-values of unpaired two-sample Wilcoxon tests are reported. (D) (I) Evolutionary distance between human and other species in branch length units. (II) ATAC-seq profiles of the *ERBB3* locus in the six species. MEL-specific enhancers that were predicted by DeepMEL and that were also found (gray) or not found (green) via liftOver of the human MEL enhancer are highlighted. (III) DeepExplainer plots for the multiple-aligned MEL-predicted *ERBB3* enhancers. Red and blue dots represent point and indel mutations, respectively.

**Figure 5.**
Core Regulatory Complex of MEL melanoma enhancers. (A) Schematic overview of motif scoring method in which extended convolutional filter hits from DeepMEL are multiplied by DeepExplainer profiles to yield significant motif hits. (B,C) Heatmap (B) and binarized heatmap (C) of the number of significant SOX, TFAP2A, MITF, and RUNX-like motif hits on the 3885 MEL-predicted regions in the human cell line MM001. (D) Aggregation plot of normalized ChIP-seq signal of SOX10, MITF, and TFAP2A on the human enhancer clusters. (E,F) Venn diagram of regions clusters on the 3885 MEL-predicted regions in human (in MM001) (E) and the 4194 MEL-predicted regions in dog (in Dog-OralMel-18249) (F). Example MEL-predicted enhancers in human and dog are shown for two of the region clusters. The ATAC-seq signal of the regions is shown in gray.

**Figure 6.**
Positional specificity of SOX10 and TFAP2A in MEL melanoma enhancers. (A,B, *top*) Example human (A) and dog (B) MEL-predicted enhancer containing significant SOX10 and TFAP2A motifs. The ATAC-seq signal is shown in gray. (A, *middle*; B, *bottom*) Imputed nucleosome start and middle point profiles. (A, *bottom*) For the human example region, ATAC-seq profiles of MM001 in control condition, after 72 h of SOX10 knockdown or TFAP2A knockdown are shown. (C) Schematic overview of the nucleosome structure explaining the colors used in D and E. (D,E) Nucleosome start point (D) and nucleosome middle point predictions (E) on MEL-predicted regions containing one SOX10 (*left*) or one TFAP2A motif (*right*) next to possible other motifs, where the regions are either centered on the ATAC-seq summit (gray) or on the SOX10 or TFAP2A motif (blue).

**Figure 7.**
Predicting causal mutations of evolutionary changes in MEL enhancers. (A,B) Example region upstream of *APPL2* that is accessible (A) and active (B) in the MEL dog line Dog-OralMel-18249 but not in human MEL lines. (C) DeepMEL prediction score of each of the 24 topics for the dog and human *APPL2* enhancer. (D) Effect on topic 4 DeepMEL score on the dog sequence when in silico simulating each of the single detected point mutations between the dog and human *APPL2* enhancer. (E) DeepExplainer plots of the middle 120 bp of the dog and human *APPL2* enhancer. In the *middle*, the effect of each possible point mutation between the dog and human sequence on the MEL DeepMEL score was in silico calculated and is represented by colored dots depending on the nucleotide to which the original dog nucleotide was in silico mutated. Truly existing point mutations between the dog and human sequence are highlighted by color-coded vertical dashed lines. Four mutations that decrease the motif score of the SOX10, MITF, and TFAP2A motifs are highlighted by a gray box and are encircled. (F) Bar plot showing the mean effect on the log₂ delta ATAC-seq signal of a non-human region compared to the human homolog depending on the number of SOX10 motif hits lost or gained. Only regions having no change in the number of significant TFAP2A, MITF, and RUNX motifs hits were used. The y-axis is normalized to the category with no changes in the number of significant SOX10 motif hits. The number of regions in each of the categories is mentioned (#). (G) Luciferase assay on six human or dog enhancers. Significant motif hits per enhancer are shown with colored crosses. For the luciferase assays: luciferase activity in MM001 is shown relative to *Renilla* signal and is log₁₀ transformed. P-values were determined using Student's t-test, and the error bars represent the standard deviation over three biological replicates.

See this image and copyright information in PMC

References

1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. 2016. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 [cs.DC].
1. Alipanahi B, Delong A, Weirauch MT, Frey BJ. 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33: 831–838. 10.1038/nbt.3300 - DOI - PubMed
1. Angermueller C, Lee HJ, Reik W, Stegle O. 2017. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 18: 67 10.1186/s13059-017-1189-z - DOI - PMC - PubMed
1. Arendt D, Musser JM, Baker CVH, Bergman A, Cepko C, Erwin DH, Pavlicev M, Schlosser G, Widder S, Laubichler MD, et al. 2016. The origin and evolution of cell types. Nat Rev Genet 17: 744–757. 10.1038/nrg.2016.127 - DOI - PubMed
1. Arunachalam M, Jayasurya K, Tomancak P, Ohler U. 2010. An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes. Bioinformatics 26: 2109–2115. 10.1093/bioinformatics/btq358 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
- ZFIN

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cross-species analysis of enhancer logic using deep learning

Affiliations

Cross-species analysis of enhancer logic using deep learning

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases