Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;5(3):369-378.
doi: 10.1038/s41559-020-01371-2. Epub 2021 Jan 18.

The emergence of the brain non-CpG methylation system in vertebrates

Affiliations

The emergence of the brain non-CpG methylation system in vertebrates

Alex de Mendoza et al. Nat Ecol Evol. 2021 Mar.

Abstract

Mammalian brains feature exceptionally high levels of non-CpG DNA methylation alongside the canonical form of CpG methylation. Non-CpG methylation plays a critical regulatory role in cognitive function, which is mediated by the binding of MeCP2, the transcriptional regulator that when mutated causes Rett syndrome. However, it is unclear whether the non-CpG neural methylation system is restricted to mammalian species with complex cognitive abilities or has deeper evolutionary origins. To test this, we investigated brain DNA methylation across 12 distantly related animal lineages, revealing that non-CpG methylation is restricted to vertebrates. We discovered that in vertebrates, non-CpG methylation is enriched within a highly conserved set of developmental genes transcriptionally repressed in adult brains, indicating that it demarcates a deeply conserved regulatory program. We also found that the writer of non-CpG methylation, DNMT3A, and the reader, MeCP2, originated at the onset of vertebrates as a result of the ancestral vertebrate whole-genome duplication. Together, we demonstrate how this novel layer of epigenetic information assembled at the root of vertebrates and gained new regulatory roles independent of the ancestral form of the canonical CpG methylation. This suggests that the emergence of non-CpG methylation may have fostered the evolution of sophisticated cognitive abilities found in the vertebrate lineage.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Locally disordered methylation characterises the lamprey epigenome
Proportion of Discordant Reads (PDR) values for a subset of CpGs (100,000) of each species (See Methods). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).
Extended Data Fig. 2
Extended Data Fig. 2. CpG hypermutability is widespread in vertebrates except the lamprey
Percentage of Single Nucleotide Variants identified from the WGBS libraries from the total number of dinucleotides in the reference genome. In pale blue are those proportions that are equal or lower than the expected (total number of SNVs / total number of dinucleotides), and in dark blue are those that are overrepresented. Note that the mouse has very few SNVs as it is a laboratory isogenic line, however it still shows a slightly higher enrichment for SNVs in CpG dinucleotides, whereas birds have very high SNV rates on CpG dinucleotides despite having intermediate levels of CpG methylation.
Extended Data Fig. 3
Extended Data Fig. 3. CpH methylation is specific to brain tissues across vertebrates.
Sequence motifs found surrounding the highest methylated CpH positions in each sample. CpH positions were required to have a coverage ≥ 10x. hpf = embryo hours post fertilization. Sox10+ cells correspond to developmental neural crest cells in zebrafish. (b) Gene Ontology enrichments for genes showing the highest and lowest gene body methylation levels in the CpA context, as defined by belonging to the top and bottom deciles in each species and tissue. (c) Gene Ontology enrichments for genes showing the highest and lowest methylated levels in the CpG context.
Extended Data Fig. 4
Extended Data Fig. 4. Anticorrelation between CpG and CpA methylation and transcription is restricted to a subset of vertebrate samples
Distribution of gene body methylation levels on genes separated by expression level on brain tissue. “No expression” category includes all genes with TPM < 1, whereas the rest of genes were classified in 10 deciles of expression (lower expression left, higher expression right). Positive correlation between expression and CpG methylation is restricted to invertebrate brain samples. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).
Extended Data Fig. 5
Extended Data Fig. 5. Gene classification by CpA and CpG methylation levels
(a) Distribution of gene body methylation levels on genes classified in deciles from lower to higher methylation levels. Few genes are CpG methylated in the honeybee (only 3 top deciles). The dynamic range of CpG gene body methylation of lampreys and birds differs from the rest of vertebrates, in which a vast majority of genes are highly methylated (>50%). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR). (b) Overlap between top and bottom decile genes classified by CpA and CpG gene body methylation levels. All deciles have the same size, thus overlap % captures the relative differences between categories in a comparable manner. (c) Level of conservation of gene sets classified by CpA and CpG gene body methylation levels. If a given orthologue is present in one subset of genes in only one species it is classified as a Singleton (1), whereas if it is found in the nine vertebrate species analyzed it is classified as 9. Each orthologue is counted once per species (e.g. if lamprey has 2 species-specific paralogues of one gene, it is only counted as 1).
Extended Data Fig. 6
Extended Data Fig. 6. Expression level of highly conserved CpA methylated genes
Standardized expression level for genes conserved in at least 7 vertebrate species as belonging to the top decile of CpA methylated genes. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).
Extended Data Fig. 7
Extended Data Fig. 7. Phylogeny and expression of DNMT3 enzymes
(a) Maximum likelihood phylogenetic tree of DNMT3 orthologues across animals, representing the full version of that presented in Figure 4a. Nodal supports represent 100 bootstrap nonparametric replications. Schematic protein domain configurations shown for each clade. PWWP, Pro-Trp-Trp-Pro motif domain (PF00855). AAD ATRX, DNMT3, DNMT3L domain. MT, cytosine Methyltransferase domain (PF00145). CH, Calponin Homology domain (PF00307). Asterisk highlights arctic lamprey sequences. Broken domains indicate that the domain has large deletions in the given clade. (b) Table with the steady-state transcriptional level of DNMT3A in vertebrate samples, and DNMT3 in invertebrate samples. Compared to previous analysis of the DNMT3 family, here we describe for the first time the presence of DNMT3L in non-mammalian genomes. These include non-avian reptiles (turtles, crocodiles and squamates) and two lamprey genomes. This indicates that DNMT3L was one of the ancestral onhologues product of the vertebrate ancestral WGD. Interestingly, both lampreys and tetrapod sequences show a truncated cytosine methyltransferase domain, which might indicate that the DNMT3L has been conserved despite its lack of catalytic activity.
Extended Data Fig. 8
Extended Data Fig. 8. Phylogeny and conservation of MBD4/MECP2
(a) Maximum likelihood phylogenetic tree of the Methyl-CpG Binding Domain family in animals, representing the full-version of Figure 4b. Nodal supports represent 100 bootstrap nonparametric replications. On the right, protein domain structure of each clade, as defined by Pfam domains. MBD, Methyl Binding Domain (PF01429). HhH-GPD, Thymine glycosylase (PF00730). MBDa, p55-binding region of MBD2/3 (PF16564). MBD_C, MBD2/3 C-terminal domain (PF14048). zf-CXXC, zinc finger (PF02008). CTD, MECP2 C-Terminal Domain. TRD, MECP2 Transcriptional Repression Domain. (b) Domain presence in MBD4/MECP2 orthologues in several invertebrate genomes. Lack of the Thymine glycosylase domain is likely due to incomplete gene annotation or genome assembly gaps.
Extended Data Fig. 9
Extended Data Fig. 9. Conservation of the MeCP2 protein domains
(a) Amino acid multi-sequence alignment (MAFFT E-INS-i mode) of the Methyl-CpG Binding domain (MBD) from MeCP2, MBD4 and invertebrate MECP2/MBD4 sequences. The black square highlights the MBD domain as defined by Pfam. The red triangles indicate positions mutated in the human MECP2 gene that cause Rett Syndrome phenotypes. (b) Amino acid multi-sequence alignment of the Transcriptional Repression Domain (TRD) from MeCP2, MBD4 and the homologous region (C-terminal of the MBD) of invertebrate MBD4/MECP2 proteins. NID stands for the N-CoR/SMRT interacting amino acids. Additional black squares highlight the AT-hook domains. Alignment visualised using Geneious software.
Extended Data Fig. 10
Extended Data Fig. 10. MBD4/MECP2 isoform expression in the european amphioxus
Diagram representing the sequences used to uniquely map RNA-seq reads to each isoform across different tissues and developmental stages. Quantification of each isoform in each sample, normalised by gene length (TPM as per Kallisto quantification).
Fig. 1
Fig. 1. Brain methylomes reflect the vertebrate-invertebrate CG methylation boundary.
a, Global brain CpG methylation, genome size, and CpG genome content across animal species. Schematic representation of established animal phylogeny on the left-hand side. Newly generated WGBS datasets marked with a blue circle, WGBS samples from non-neural tissue marked with a red circle. The Ciona intestinalis sample corresponds to muscle tissue, and sea anemone Nematostella vectensis sample corresponds to a gastrula sample. Genome size represents the genome assembly size. b, Proportion of CpG sites classified according to methylation levels (mC/C). Only sites with coverage ≥ 10x were considered. Silhouettes of human, platypus, octopus and honeybee obtained from phylopic.org.
Fig. 2
Fig. 2. Neural CpH methylation is restricted to vertebrate brains.
a, Global methylation levels in brain samples classified per dinucleotide context. Dark blue represents the global methylation level on the nuclear chromosomes (excluding mitochondrial genome) and pale blue represents the bisulfite reaction non-conversion rate for each library, calculated as the methylation levels on an unmethylated lambda phage DNA spike-in. b, Sequence motifs found surrounding the most highly methylated CpH positions in each brain sample. Only CpH positions with coverage ≥ 10x were considered. c, Methylation level (mC/C) for the top mCpH positions depicted in panel b. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR). Silhouettes of human, platypus, octopus and honeybee obtained from phylopic.org.
Fig. 3
Fig. 3. Conserved non-overlapping programs are associated with CpH and CpG methylation.
a, Gene Ontology enrichments for genes showing the highest and lowest gene body methylation levels in the CpA context, as defined by belonging to the top and bottom deciles in each species. b, Gene Ontology enrichments for genes showing the highest and lowest methylated levels in the CpG context. Q-values were obtained using the g:SCS algorithm implemented in the gProfiler2 R package.
Fig. 4
Fig. 4. Vertebrate origins of MECP2 and DNMT3A.
a, Maximum likelihood phylogenetic tree of DNMT3 genes in animals. Nodal supports represent 100 bootstrap nonparametric replications. Schematic protein domain configurations shown for each clade. PWWP, Pro-Trp-Trp-Pro motif domain (PF00855). AAD ATRX, DNMT3, DNMT3L domain. MT, cytosine Methyltransferase domain (PF00145). CH, Calponin Homology domain (PF00307). Asterisk highlights arctic lamprey sequences. Broken domains indicate that the domain has large deletions in the given clade. b, Maximum likelihood phylogenetic tree of the Methyl-CpG Binding Domain family in animals. Nodal supports represent 100 bootstrap nonparametric replications. On the right, protein domain structure of each clade, as defined by Pfam domains. MBD, Methyl Binding Domain (PF01429). HhH-GPD, Thymine glycosylase (PF00730). MBDa, p55-binding region of MBD2/3 (PF16564). MBD_C, MBD2/3 C-terminal domain (PF14048). zf-CXXC, zinc finger (PF02008). CTD, MECP2 C-Terminal Domain. TRD, MECP2 Transcriptional Repression Domain. Asterisks highlight vertebrate sequences, percentages are shown for amino acid MBD identity between lamprey and human orthologues. c, Distribution of MECP2/MBD4 and DNMT3 genes across animal lineages. Absence of a dot indicates gene absence. Numbers indicate those species/lineages that have multiple copies of a given gene. Dnmt3c in rodents and dnmt3ba/bb.1/bb.2 are lineage-specific duplications of DNMT3B that have diverged in their function or domain architecture. “x3” indicates lineage-specific duplications. On the right, the phylogenetic relationships among animal lineages. d, Stepwise evolution of the MeCP2 and MBD4 protein domains in vertebrates, amphioxus, and non-chordates. NID stands for the N-CoR/SMRT interacting amino acids. e, Genome browser snapshot of amphioxus MBD4 locus. The longer isoform with the capacity to repair DNA has higher expression in embryonic samples, see further detail in Extended Data 10.
Fig. 5
Fig. 5. The assembly of neural-CpH methylation.
Cladogram representing the evolutionary scenario of neural CpH methylation acquisition in vertebrates. Silhouettes of octopus and honeybee obtained from phylopic.org.

References

    1. Schübeler D. Function and information content of DNA methylation. Nature. 2015;517:321–326. - PubMed
    1. Luo C, Hajkova P, Ecker JR. Dynamic DNA methylation: In the right place at the right time. Science. 2018;361:1336–1340. - PMC - PubMed
    1. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16:6–21. - PubMed
    1. Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet. 2008;9:465–476. - PubMed
    1. de Mendoza A, Lister R, Bogdanovic O. Evolution of DNA Methylome Diversity in Eukaryotes. J Mol Biol. 2019 doi: 10.1016/j.jmb.2019.11.003. - DOI - PubMed

Publication types

Substances