Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 17;48(7):3734-3746.
doi: 10.1093/nar/gkaa113.

Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited transcriptomes

Affiliations

Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited transcriptomes

Stephan Werner et al. Nucleic Acids Res. .

Abstract

Reverse transcription (RT) of RNA templates containing RNA modifications leads to synthesis of cDNA containing information on the modification in the form of misincorporation, arrest, or nucleotide skipping events. A compilation of such events from multiple cDNAs represents an RT-signature that is typical for a given modification, but, as we show here, depends also on the reverse transcriptase enzyme. A comparison of 13 different enzymes revealed a range of RT-signatures, with individual enzymes exhibiting average arrest rates between 20 and 75%, as well as average misincorporation rates between 30 and 75% in the read-through cDNA. Using RT-signatures from individual enzymes to train a random forest model as a machine learning regimen for prediction of modifications, we found strongly variegated success rates for the prediction of methylated purines, as exemplified with N1-methyladenosine (m1A). Among the 13 enzymes, a correlation was found between read length, misincorporation, and prediction success. Inversely, low average read length was correlated to high arrest rate and lower prediction success. The three most successful polymerases were then applied to the characterization of RT-signatures of other methylated purines. Guanosines featuring methyl groups on the Watson-Crick face were identified with high confidence, but discrimination between m1G and m22G was only partially successful. In summary, the results suggest that, given sufficient coverage and a set of specifically optimized reaction conditions for reverse transcription, all RNA modifications that impede Watson-Crick bonds can be distinguished by their RT-signature.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Scatter plot showing the average m1A signatures of 13 RTs at 26 m1A sites in yeast cytosolic tRNA. Error bars show standard deviations of arrest and mismatch rates across 3 sequencing runs, i.e. triplicates. The colour-code represents the jump rate. Arrest rate percentages refer to the reads covering the 3′ adjacent position of m1A (+1 position). Mismatch and jump rate percentages refer to the reads covering the m1A position. (B) Pearson correlation coefficient (PCC) matrix heat map. Evaluation of interrelations and mutual influences by showing the positive and negative correlations between the RT-signature features arrest, mismatch and jump rate as well as the random forest performance measure AUC (Area Under Curve) and the read length (see also Supplement Figure S9 for additional information on TGIRT and HIV-RT).
Figure 2.
Figure 2.
Bar plot with random forest performance and feature importance by RT. Classification performance is represented as Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). Colour-code represents the feature importance for the classification. Data was averaged from triplicates. Jump = jump rate. C, T, G = mismatch components, which add up to 100%. Mismatch = mismatch rate. Arrest = arrest rate. Percentages represent feature importance in random forest analysis = mean loss in classification accuracy, if values of respective feature are permutated. (See also Supplement Figure S9 for additional information on TGIRT and HIV-RT)
Figure 3.
Figure 3.
Prediction scheme for guanosine modifications. A first random forest model was trained (10-repetitions 5-fold cross validation) separately by RT on m1G & m22G sites from two replicates of total tRNA samples from Saccharomyces cerevisiae (RNA data) to distinguish modified (positive class) from unmodified (negative class) guanosines. The trained random forest then was used to make a prediction on guanosines from a corresponding third yeast data set. In this case, the prediction is a binary classification, the two classes being ‘m1G, m22G’ or ‘non-modified Gs’. Guanosines which were classified as modified guanosines (m1G and m22G) in the first prediction were then written into an output file (RNA data modified Gs). Then, two random forest models were trained (10-repetitions 5-fold cross validation) separately by RT on either m1G or m22G sites from two replicates of total tRNA samples from S. cerevisiae (RNA data) as positive class to separate these guanosine modifications from each other (the respective other modification together with non-modified Gs (1:1 ratio) served as negative class). The trained models were then used to make a prediction on the output file from the first prediction (RNA data Modified Gs).
Figure 4.
Figure 4.
Examples for RT-signatures of m1A, m1G and m22G by RT from revolver oligo analyses and the expected structural impairment of the Watson–Crick base-pairing. Graphs from the revolver oligo with a neighboring A, 3′ adjacent (+1 position) to the modified site at position 9, are shown. Sites with error rates of more than 10% are highlighted with yellow arrows. Colored bars indicate the nature of the reads. The mismatch rate is depicted as black cross and the arrest rate as red line. The modified site is shown at position 9 in the middle of the considered sequence. In general, arrest rate percentages refer to the reads covering the 3′ adjacent position of m1A/m1G/m22G (+1 position). Mismatch rate percentages refer to the reads covering the modified position; Note that statements on average values stated in the text may differ from these individual signatures.
Figure 5.
Figure 5.
m1G and m22G RT-signature comparison for the analysed reverse transcriptases RT #3, #5, #11 and #12. (A) Revolver oligo. Dot plots of mismatch and arrest RT-signatures at m22G (dots) and m1G (triangles) sites in revolver oligos for base configurations guanosine (orange), cytidine (blue), uridine (red, T in mapping profile) and adenosine (green) at position +1. Mismatch and arrest rates are given in percentage. Data was averaged from triplicates; error bars show standard deviations of arrest and mismatch rates. (B) Total tRNA from Saccharomyces cerevisiae. Dot plots of mismatch and arrest RT-signatures at m22G (black dots) and m1G (gray triangles) sites in total tRNA. Data was averaged from triplicates; error bars show standard deviations of arrest and mismatch rates. m1G and m22G sites which are present in all three total tRNA replicates and show a coverage of at least 20 reads in at least two replicates are shown (see Supplement Table S8 for more details). In general, arrest rate percentages refer to the reads covering the 3′ adjacent position of m1G/m22G (+1 position). Mismatch rate percentages refer to the reads covering the modified position.

References

    1. Temin H.M., Mizutani S.. RNA-dependent DNA polymerase in virions of Rous sarcoma virus. Nature. 1970; 226:1211–1213. - PubMed
    1. Baltimore D. RNA-dependent DNA polymerase in virions of RNA tumour viruses. Nature. 1970; 226:1209–1211. - PubMed
    1. Mayer G., Muller J., Lunse C.E.. RNA diagnostics: real-time RT-PCR strategies and promising novel target RNAs. Wiley Interdiscip. Rev. RNA. 2011; 2:32–41. - PubMed
    1. Wang Z., Gerstein M., Snyder M.. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009; 10:57–63. - PMC - PubMed
    1. Konishi A., Yasukawa K., Inouye K.. Improving the thermal stability of avian myeloblastosis virus reverse transcriptase alpha-subunit by site-directed mutagenesis. Biotechnol. Lett. 2012; 34:1209–1215. - PubMed

Publication types

LinkOut - more resources