Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jun 30;21(13):4684.
doi: 10.3390/ijms21134684.

A Census and Categorization Method of Epitranscriptomic Marks

Affiliations
Review

A Census and Categorization Method of Epitranscriptomic Marks

Julia Mathlin et al. Int J Mol Sci. .

Abstract

In the past few years, thorough investigation of chemical modifications operated in the cells on ribonucleic acid (RNA) molecules is gaining momentum. This new field of research has been dubbed "epitranscriptomics", in analogy to best-known epigenomics, to stress the potential of ensembles of RNA modifications to constitute a post-transcriptional regulatory layer of gene expression orchestrated by writer, reader, and eraser RNA-binding proteins (RBPs). In fact, epitranscriptomics aims at identifying and characterizing all functionally relevant changes involving both non-substitutional chemical modifications and editing events made to the transcriptome. Indeed, several types of RNA modifications that impact gene expression have been reported so far in different species of cellular RNAs, including ribosomal RNAs, transfer RNAs, small nuclear RNAs, messenger RNAs, and long non-coding RNAs. Supporting functional relevance of this largely unknown regulatory mechanism, several human diseases have been associated directly to RNA modifications or to RBPs that may play as effectors of epitranscriptomic marks. However, an exhaustive epitranscriptome's characterization, aimed to systematically classify all RNA modifications and clarify rules, actors, and outcomes of this promising regulatory code, is currently not available, mainly hampered by lack of suitable detecting technologies. This is an unfortunate limitation that, thanks to an unprecedented pace of technological advancements especially in the sequencing technology field, is likely to be overcome soon. Here, we review the current knowledge on epitranscriptomic marks and propose a categorization method based on the reference ribonucleotide and its rounds of modifications ("stages") until reaching the given modified form. We believe that this classification scheme can be useful to coherently organize the expanding number of discovered RNA modifications.

Keywords: RNA modifications; epigenetic regulation; epitranscriptome; epitranscriptomics; gene-expression regulation; post-transcriptional regulation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Chemical formulas for the main types of epitranscriptomic marks. Unmodified ribonucleic acid (RNA) bases are shown in black to the left-most position of each row, while chemical formulas to the right are some of their cognate modified forms, with chemical changes highlighted in red. Grey inset at top-right corner shows the 2’-O-methylation (or Nm, where N stands for any nucleoside), a common modification that can appear on any of the ribonucleosides.
Figure 2
Figure 2
Summary scheme and representative tree of guanosine-derived RNA modifications illustrating the proposed categorization method. (a) For each ribonucleoside (A, C, G, U), the scheme reports in squares the number of known chemically-modified derivatives occurring at the given stage of modification, as well as its total counts (bottom). Stages of modification range from 1 (top-left) to 9 (bottom-left), each marked by a different color illustrated in the filled squares on the left and bordering squares that report cognate counts for each ribonucleoside. The scheme summarizes a proposed classification for 134 currently known chemical modifications according to their root nucleoside. Some of these modified RNA bases can be derived by means of one single chemical modification of each natural ribonucleoside (stage 1), some others can be obtained from one further step of chemical modification acting upon stage 1-products (stage 2), and so on up to the maximum number of modifying steps (stage 9). The bottom-right inset lists three additional RNA modifications—the A-derived ac6A and the two U-derived cm5s2U and cnm5U modifications—currently lacking enough details on the synthesis process to be assigned a stage in the scheme. (b) Tree-representation of all known G-derived RNA modifications following the classification method summarized in (a). Border colors of leaves (or nodes) in the tree representation indicate the corresponding stage according to the color-scheme reported in the legend. Shown RNA modifications are the union of current knowledge gathered from eukarya, bacteria, and archaea. A = Adenosine; ac6A = N6-acetyladenosine; C = Cytidine; cm5s2U = 5-carboxymethyl-2-thiouridine; cnm5U = 5-cyanomethyluridine; G+ = archaeosine; G = Guanosine; galQ = galactosyl-queuosine; gluQ = glutamyl-queuosine; Gm = 2-O-methylguanosine; Gr(p) = 2-O-ribosylguanosine (phosphate); imG = wyosine; imG-14 = 4-demethylwyosine; imG2 = isowyosine; m1G = 1-methylguanosine; m1Gm = 1,2-O-dimethylguanosine; m2Gm = N2,2-O-dimethylguanosine; m2,2Gm = N2,N2,2-O-trimethylguanosine; m2,7Gm = N2,7,2-O-trimethylguanosine; m2G = 2-methylguanosine; m2,2G = N2,N2-dimethylguanosine; m2,2Gm = N2,N2,2-O-trimethylguanosine; m2,2,7G = N2,N2,7-trimethylguanosine; m2,7G = N2,7-dimethylguanosine; m2,7Gm = N2,7,2-O-trimethylguanosine; m7G = 7-methylguanosine; manQ = mannosyl-queuosine; mimG = methylwyosine; o2yW = peroxywybutosine; OHyW = hydroxywybutosine; OHyWx = undermodified hydroxywybutosine; OHyWy = methylated undermodified hydroxywybutosine; Q = queuosine; U = Uridine; yW = wybutosine; yW-58 = 7-aminocarboxypropylwyosine methyl ester; yW-72 = 7-aminocarboxypropylwyosine; yW-86 = 7-aminocarboxypropyl-demethylwyosine.
Figure 3
Figure 3
Tree-representation of known RNA modifications following the proposed categorization scheme. Starting from the initial base (adenosine in panel (a), cytidine in (b), and uridine in (c)), each connection (or branch) corresponds to a chemical modification added on the previous state (that is, the preceding leave). Border colors of leaves (or nodes) in the tree representation indicate the corresponding stage according to the color-scheme reported in the legend. Shown RNA modifications are the union of current knowledge gathered from eukarya, bacteria, and archaea. A = adenosine; ac4C = N4-acetylcytidine; ac4Cm = N4-acetyl-2-O-methylcytidine; acp3Ψ = 3-(3-amino-3-carboxypropyl)pseudouridine; acp3D = 3-(3-amino-3-carboxypropyl)-5,6-dihydrouridine; acp3U = 3-(3-amino-3-carboxypropyl)uridine; Am = 2-O-methyladenosine; Ar(p) = 2-O-ribosyladenosine (phosphate); C+ = agmatidine; C = cytidine; chm5U = 5-carboxyhydroxymethyluridine; Cm = 2-O-methylcytidine; cm5U = 5-carboxymethyluridine; cmnm5ges2U = 5-carboxymethylaminomethyl-2-geranylthiouridine; cmnm5s2U = 5-carboxymethylaminomethyl-2-thiouridine; cmnm5se2U = 5-carboxymethylaminomethyl-2-selenouridine; cmnm5U = 5-carboxymethylaminomethyluridine; cmnm5Um = 5-carboxymethylaminomethyl-2-O-methyluridine; cmo5U = uridine 5-oxyacetic acid; ct6A = cyclic N6-threonylcarbamoyladenosine; D = dihydrouridine; f5C = 5-formylcytidine; f5Cm = 5-formyl-2-O-methylcytidine; f6A = N6-formyladenosine; g6A = N6-glycinylcarbamoyladenosine; ges2U = 2-geranylthiouridine; hm5C = 5-hydroxymethylcytidine; hm5Cm = 2-O-Methyl-5-hydroxymethylcytidine; hm6A = N6-hydroxymethyladenosine; hn6A = N6-hydroxynorvalylcarbamoyladenosine; ho5C = 5-hydroxycytidine; ho5U = 5-hydroxyuridine; ht6A = hydroxy-N6-threonylcarbamoyladenosine; I = inosine; i6A = N6-isopentenyladenosine; Im = 2-O-methylinosine; inm5s2U = 5-(isopentenylaminomethyl)-2-thiouridine; inm5U = 5-(isopentenylaminomethyl)uridine; inm5Um = 5-(isopentenylaminomethyl)-2-O-methyluridine; io6A = N6-(cis-hydroxyisopentenyl)adenosine; k2C = 2-lysidine; m1A = 1-methyladenosine; m1acp3Ψ = 1-methyl-3-(3-amino-3-carboxypropyl)pseudouridine; m1Am = 1,2-O-dimethyladenosine; m1I = 1-methylinosine; m1Im = 1,2-O-dimethylinosine; m1Ψ = 1-methylpseudouridine; m2A = 2-methyladenosine; m2,8A = 2,8-dimethyladenosine; m3C = 3-methylcytidine; m3U = 3-methyluridine; m3Um = 3,2-O-dimethyluridine; m3Ψ = 3-methylpseudouridine; m4C = N4-methylcytidine; m4Cm = N4,2-O-dimethylcytidine; m4,4C = N4,N4-dimethylcytidine; m4,4Cm = N4,N4,2-O-trimethylcytidine; m5C = 5-methylcytidine; m5Cm = 5,2-O-dimethylcytidine; m5D = 5-methyldihydrouridine; m5s2U = 5-methyl-2-thiouridine; m5U = 5-methyluridine; m5Um = 5,2-O-dimethyluridine; m6A = 6-methyladenosine; m6Am = N6,2-O-dimethyladenosine; m6,6A = N6,N6-dimethyladenosine; m6,6Am = N6,N6,2-O-trimethyladenosine; m6t6A = N6-methyl-N6-threonylcarbamoyladenosine; m8A = 8-methyladenosine; mchm5U = 5-(carboxyhydroxymethyl)uridine methyl ester; mchm5Um = 5-(carboxyhydroxymethyl)-2-O-methyluridine methyl ester; mcm5s2U = 5-methoxycarbonylmethyl-2-thiouridine; mcm5U = 5-methoxycarbonylmethyluridine; mcm5Um = 5-methoxycarbonylmethyl-2-O-methyluridine; mcmo5U = uridine 5-oxyacetic acid methyl ester; mcmo5Um = 2-O-methyluridine 5-oxyacetic acid methyl ester; mnm5ges2U = 5-methylaminomethyl-2-geranylthiouridine; mnm5s2U = 5-methylaminomethyl-2-thiouridine; mnm5se2U = 5-methylaminomethyl-2-selenouridine; mnm5U = 5-methylaminomethyluridine; mo5U = 5-methoxyuridine; ms2ct6A = 2-methylthio cyclic N6-threonylcarbamoyladenosine; ms2hn6A = 2-methylthio-N6-hydroxynorvalylcarbamoyladenosine; ms2i6A = 2-methylthio-N6-isopentenyladenosine; ms2io6A = 2-methylthio-N6-(cis-hydroxyisopentenyl) adenosine; ms2m6A = 2-methylthio-6-methyladenosine; ms2t6A = 2-methylthio-N6-threonylcarbamoyladenosine; msms2i6A = 2-methylthiomethylenethio-N6-isopentenyl-adenosine; ncm5U = 5-carbamoylmethyluridine; nchm5U = 5-carbamoylhydroxymethyluridine; ncm5s2U = 5-carbamoylmethyl-2-thiouridine; ncm5Um = 5-carbamoylmethyl-2-O-methyluridine; nm5ges2U = 5-aminomethyl-2-geranylthiouridine; nm5s2U = 5-aminomethyl-2-thiouridine; nm5se2U = 5-aminomethyl-2-selenouridine; nm5U = 5-aminomethyluridine; s2C = 2-thiocytidine; s2U = 2-thiouridine; s2Um = 2-thio-2-O-methyluridine; s4U = 4-thiouridine; se2U = 2-selenouridine; t6A = N6-threonylcarbamoyladenosine; tm5U = 5-taurinomethyluridine; tm5s2U = 5-taurinomethyl-2-thiouridine; U = uridine; Um = 2-O-methyluridine; Ψ = pseudouridine; Ψm = 2-O-methylpseudouridine.
Figure 4
Figure 4
Known transfer RNA (tRNA) modifications. A schematic representation of the tRNA secondary structure is shown with circles representing RNA residues. Grey circles and numbers therein represent modified RNA residues and their position along the tRNA primary sequence. Connecting lines between RNA residues indicate base pairing. Three preeminent tRNA regions are labeled: the D-loop (residues 14–21), the anticodon (residues 34–36), and the TΨC-loop (residues 54–60). (a) tRNA modifications having as original substrate adenosine (A) residues; (b) tRNA modifications having as original substrate cytidine (C) residues; (c) tRNA modifications having as original substrate guanosine (G) residues; (d) tRNA modifications having as original substrate uridine (U) residues. ac4C = N4-acetylcytidine; acp3D = 3-(3-amino-3-carboxypropyl)-5,6-dihydrouridine; acp3U = 3-(3-amino-3-carboxypropyl)uridine; Am = 2-O-methyladenosine; Ar(p) = 2-O-ribosyladenosine (phosphate); bact. = bacterial; C+ = agmatidine; Cm = 2-O-methylcytidine; cm5s2U = 5-carbamoylmethyl-2-thiouridine; cmnm5ges2U = 5-carboxymethylaminomethyl-2-geranylthiouridine; cmnm5s2U = 5-carboxymethylamino methyl-2-thiouridine; cmnm5se2U = 5-carboxymethylaminomethyl-2-selenouridine; cmnm5U = 5-carboxymethylaminomethyluridine; cmnm5Um = 5-carboxymethylaminomethyl-2-O-methyluridine; cmo5U = uridine 5-oxyacetic acid; ct6A = cyclic N6-threonylcarbamoyladenosine; D = dihydrouridine; f5Cm = 5-formyl-2-O-methylcytidine; galQ = galactosyl-queuosine; ges2U = 2-geranylthiouridine; gluQ = glutamyl-queuosine; Gm = 2-O-methylguanosine; Gr(p) = 2-O-ribosylguanosine (phosphate); I = inosine; i6A = N6-isopentenyladenosine; imG = wyosine; imG-14 = 4-demethylwyosine; imG2 = isowyosine; io6A = N6-(cis-hydroxyisopentenyl)adenosine; k2C = 2-lysidine; m1A = 1-methyladenosine; m1G = 1-methylguanosine; m1I = 1-methylinosine; m1Im = 1,2-O-dimethylinosine; m1Ψ = 1-methylpseudouridine; m2,2G = N2,N2-dimethylguanosine; m2A = 2-methyladenosine; m2G = 2-methylguanosine; m3C = 3-methylcytidine; m3U = 3-methyluridine; m5C = 5-methylcytidine; m5s2U = 5-methyl-2-thiouridine; m5U = 5-methyluridine; m5Um = 5,2-O-dimethyluridine; m6t6A = N6-methyl-N6-threonylcarbamoyladenosine; m7G = 7-methylguanosine; manQ = mannosyl-queuosine; mchm5U = 5-(carboxyhydroxymethyl)uridine methyl ester; mcm5s2U = 5-methoxycarbonylmethyl-2-thiouridine; mcm5U = 5-methoxycarbonylmethyluridine; mimG = methylwyosine; mnm5ges2U = 5-methylaminomethyl-2-geranylthiouridine; mnm5s2U = 5-methylaminomethyl-2-thiouridine; mnm5se2U = 5-methylaminomethyl-2-selenouridine; mnm5U = 5-methylaminomethyluridine; mo5U = 5-methoxyuridine; ms2i6A = 2-methylthio-N6-isopentenyladenosine; ms2io6A = 2-methylthio-N6-(cis-hydroxyisopentenyl) adenosine; ms2t6A = 2-methylthio-N6-threonylcarbamoyladenosine; nchm5U = 5-carbamoylhydroxymethyluridine; ncm5U = 5-carbamoylmethyluridine; ncm5Um = 5-carbamoylmethyl-2-O-methyluridine; nm5s2U = 5-aminomethyl-2-thiouridine; nm5se2U = 5-aminomethyl-2-selenouridine; nm5U = 5-aminomethyluridine; o2yW = peroxywybutosine; OHyW = hydroxywybutosine; Q = queuosine; s2C = 2-thiocytidine; s2U = 2-thiouridine; s2Um = 2-thio-2-O-methyluridine; t6A = N6-threonylcarbamoyladenosine; tm5s2U = 5-taurinomethyl-2-thiouridine; tm5U = 5-taurinomethyluridine; Um = 2-O-methyluridine; yW = wybutosine; yW-58 = 7-aminocarboxypropylwyosine methyl ester; yW-72 = 7-aminocarboxypropylwyosine; yW-86 = 7-aminocarboxypropyl-demethylwyosine; Ψ = pseudouridine; Ψm = 2-O-methylpseudouridine.
Figure 5
Figure 5
Known RNA modifications in messenger RNA (mRNA). The figure lists epitranscriptomic marks found in mRNA, along with their preferred location and motif at occurrence sites, if known. Of note, recent reports on Ψ and m5C marks in mRNA highlight the importance of structural motifs as determinants for modification (see text). A = adenosine; ac4C = N4-acetylcytidine; Am = 2’-O-methyladenosine; BCA motif = (B = C/G/U); C = cytidine; Cm = 2’-O-methylcytidine; DRACH motif (D=A/U/G, R=A/G, H=A/C/U); f6A = N6-formyladenosine; G = guanosine; Gm = 2’-O-methylguanosine; hm5C = 5-hydroxymethylcytidine; hm6A = N6-hydroxymethyladenosine; I = inosine; m1A = 1-methyladenosine; m5C = 5-methylcytidine; m6A = 6-methyladenosine; m6Am = N6,2-O-dimethyladenosine; m7G = 7-methylguanosine; U = uridine; Um = 2’-O-methyluridine; Ψ = pseudouridine.

References

    1. Davis F.F., Allen F.W. Ribonucleic Acids from Yeast Which Contain a Fifth Nucleotide. J. Biol. Chem. 1957;227:907–915. - PubMed
    1. Saletore Y., Meyer K., Korlach J., Vilfan I.D., Jaffrey S., Mason C.E. The birth of the Epitranscriptome: Deciphering the function of RNA modifications. Genome Biol. 2012;13:175. doi: 10.1186/gb-2012-13-10-175. - DOI - PMC - PubMed
    1. Jia G., Fu Y., Zhao X., Dai Q., Zheng G., Yang Y., Yi C., Lindahl T., Pan T., Yang Y.G., et al. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat. Chem. Biol. 2011;7:885–887. doi: 10.1038/nchembio.687. - DOI - PMC - PubMed
    1. Dominissini D., Moshitch-Moshkovitz S., Schwartz S., Salmon-Divon M., Ungar L., Osenberg S., Cesarkas K., Jacob-Hirsch J., Amariglio N., Kupiec M., et al. Topology of the human and mouse m 6 A RNA methylomes revealed by m 6 A-seq. Nature. 2012;485:201–206. doi: 10.1038/nature11112. - DOI - PubMed
    1. Meyer K.D., Saletore Y., Zumbo P., Elemento O., Mason C.E., Jaffrey S.R. Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3’ UTRs and Near Stop Codons. Cell. 2012;149:1635–1646. doi: 10.1016/j.cell.2012.05.003. - DOI - PMC - PubMed

LinkOut - more resources