Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun;582(7812):438-442.
doi: 10.1038/s41586-020-2253-5. Epub 2020 May 6.

Determination of RNA structural diversity and its role in HIV-1 RNA splicing

Affiliations

Determination of RNA structural diversity and its role in HIV-1 RNA splicing

Phillip J Tomezsko et al. Nature. 2020 Jun.

Erratum in

Abstract

Human immunodeficiency virus 1 (HIV-1) is a retrovirus with a ten-kilobase single-stranded RNA genome. HIV-1 must express all of its gene products from a single primary transcript, which undergoes alternative splicing to produce diverse protein products that include structural proteins and regulatory factors1,2. Despite the critical role of alternative splicing, the mechanisms that drive the choice of splice site are poorly understood. Synonymous RNA mutations that lead to severe defects in splicing and viral replication indicate the presence of unknown cis-regulatory elements3. Here we use dimethyl sulfate mutational profiling with sequencing (DMS-MaPseq) to investigate the structure of HIV-1 RNA in cells, and develop an algorithm that we name 'detection of RNA folding ensembles using expectation-maximization' (DREEM), which reveals the alternative conformations that are assumed by the same RNA sequence. Contrary to previous models that have analysed population averages4, our results reveal heterogeneous regions of RNA structure across the entire HIV-1 genome. In addition to confirming that in vitro characterized5 alternative structures for the HIV-1 Rev responsive element also exist in cells, we discover alternative conformations at critical splice sites that influence the ratio of transcript isoforms. Our simultaneous measurement of splicing and intracellular RNA structure provides evidence for the long-standing hypothesis6-8 that heterogeneity in RNA conformation regulates splice-site use and viral gene expression.

PubMed Disclaimer

Conflict of interest statement

[Ethics declarations]

The authors declare no competing interests.

Figures

Extended Data Figure 1 |
Extended Data Figure 1 |. DREEM clustering pipeline for DMS-MaPseq data.
a, A read x” is represented as a series of D bits, where D is the length of the read. A base is denoted by the bit ‘1’ if it is mutated away from the reference and by ‘0’, otherwise. Let K be the number of clusters in the sample. μk = {μk1,…, μkD} is the mutation profile of cluster k and πk is the mixing proportion of cluster k such that k=1Kπk=1 for k = 1 to K. The model parameters μ and π are randomly initialized. In the Expectation step, reads are assigned probabilistically to clusters and the likelihood of observing the data given the model parameters is computed. In the Maximization step, the mixing proportion is calculated from the reads assignments and the mutation profiles are updated for each cluster to maximize the expectation value of the complete case likelihood. The Expectation steps alternate with the Maximization steps until the likelihood converges. The likelihood function is derived using Bernoulli mixture models modified to account for missing data in the form of the underrepresentation of reads with adjacent mutations. b, Mutational distance distribution between bases in denatured, DMS modified, total RNA. Plotted is the mutation distance verses frequency between two DMS reactive positions i.e. A or C to A or C, yellow bars, and between one DMS reactive position and a background mutation (e.g. mutation due to sequencing error) i.e. A or C to T or G, blue bars. The blue bars demonstrate the frequency of observing two mutations due to background.
Extended Data Figure 2 |
Extended Data Figure 2 |. DREEM clustering identifies and quantifies individual structures from in vitro mixing experiments.
Structure 1 and Structure 2 sequences were in vitro transcribed and re-folded, mixed in different proportions, and probed with DMS-MaPseq. The region used for DREEM clustering covers nucleotides 21–135 (labeled as 1–115 on the figure), which excludes the primers used for RT-PCR (that have no DMS-induced mutations) and is identical in sequence for the two structures except for the A>C mutation at position 94. Position 94 is masked during analysis. The topmost panel shows the DMS reactivity pattern of Structure 1 by itself and Structure 2 by itself. The rest of the panels show the clustering results at specified mixing ratios (n=1).
Extended Data Figure 3 |
Extended Data Figure 3 |. Secondary structure models for V. vulnificus Adenoriboswitch (add).
a, Percentages for each cluster detected in the presence or absence of 5mM adenine to the add riboswitch. b, In vitro structure models obtained from probing add using DMS-MaPseq followed by DREEM, color coded by normalized DMS signal. The apoB and apoB alternative structures represent the OFF state, which is incompetent for ligand binding. The apoA represents the ON state. Previously identified helices are boxed and labeled.
Extended Data Figure 4 |
Extended Data Figure 4 |. DREEM clustering reveals an equilibrium of 4-stem and 5-stem structures for in vitro folded HIV-1 RRE.
a, Population average DMS-MaPseq data for in vitro transcribed, refolded and DMS treated or untreated samples. b, Scatter plots showing the reproducibility of the DMS signal from DREEM clustering results between 2 replicates with different DMS modification conditions. Replicate 1 was modified in 0.25% DMS and replicate 2 was modified with 2.5% DMS. R2 is Pearson’s R2. c, DREEM clustering data from b was used as constrains to generate RNA structure models. Shown are the models derived for cluster 1 and cluster 2 from replicate 1, color coded by normalized DMS signal.
Extended Data Figure 5 |
Extended Data Figure 5 |. HIV-1 RRE forms two stable alternative structures in CD4+ T cells.
a, Schematic representation of DMS treatment in primary cells and isolated virions. b, DMS-MaPseq probing of intracellular HIV-1NL4–3 RRE in CD4+ T cells was used as input for DREEM clustering. Two clusters passed the BIC test and were used as RNAstructure folding constraints. Structural models are color coded by normalized DMS reactivity and bases not covered by the region of PCR are colored in gray. Data used to construct models are representative data from n=2 biologically independent experiments.
Extended Data Figure 6 |
Extended Data Figure 6 |. A3 splice site forms alternative structures in vitro.
472nt A3 sequence from HIV-1NHG strain was in vitro transcribed, re-folded, and probed with DMS-MaPseq. DREEM clustering-based models for the local structures forming at A3 are shown, color coded by normalized DMS signal. Percentages of cluster 1 and 2 come from an n=1 experiment as determined by DREEM.
Extended Data Figure 7 |
Extended Data Figure 7 |. Splice site usage in additional A3 mutants.
a, Structure models illustrating the mutant design for A3SLMut4 and A3SL Mut5. b, Splice usage for Mut4 and Mut5 for A1–5 reported as fold change compared to Δvpr HIV-1NHG. Central bar represents the mean and error bars indicate s.d. N=4 biologically independent experiments. c, Average fraction of transcripts using A3 compared to % cluster 1 (A3SL) as determined by DREEM (n=1) for A3SLMut1–5. Mutants are color coded. • indicates multiply spliced (MS) HIV-1 transcripts and ▲ indicates singly splice (SS) HIV-1 transcripts.
Extended Data Figure 8 |
Extended Data Figure 8 |. Structural models of A3SLMut1 and A3SL Mut4.
a, Structural models for A3SLMut1 derived from n=1 experiment after DREEM clustering; pink box is the region of mutations; blue box is the splice site; Exonic Splicing Enhancer (ESE) and Exonic Splicing Silencer (ESS) binding sites are shown. b, Structural models made using DMS-MaPseq data from HEK293t cells transfected with vprHIV-1NHG A3SLMut4. The sequence of the A3 splice site is boxed in dark blue. The locations of the mutations are boxed in pink. Splice enhancer and suppressor binding sites are highlighted (ESS2p: purple, ESEtat: blue, ESE2: orange, ESS2: green). Percentages of each cluster come from an n=1 experiment.
Extended Data Figure 9 |
Extended Data Figure 9 |. Genome-wide HIV-1NHG library generation quality control.
a, Coverage of HIV-1 genome with DMS-MaPseq data from HEK293t cells transfected with HIV-1NHG. b, Moving average of A and C mutational frequency in 100 nt windows after DMS-MaPseq compared to moving average T and G mutational frequency. c, DMS-MaPseq data from HEK293t cells transfected with HIV-1NHG was used as input for DREEM. Local 80nt window from Fig.4 for the RRE region was used for clustering. Percentages of cluster 1 and 2 come from an n=1 experiment. Nucleotides were color-coded based on normalized DMS signal; bases outside of the window used for clustering are colored in grey. d, The A3 splice site was analyzed using DMS-MaPseq and DREEM clustering from genome-wide data from HEK293t transfected with HIV-1NHG. Percentages of cluster 1 and 2 come from an n=1 experiment as determined by DREEM. Nucleotides were color coded with normalized DMS signal. e, A region of the HIV-1 genome in the pol coding region (nt 2000–2120 based on HIV-1NHG genomic RNA coordinates) was analyzed using DMS-MaPseq and DREEM clustering from genome-wide data from HEK293t transfected with HIV-1NHG. Two clusters passed the BIC in adjacent 80 nt windows that overlapped by 40 nt. The two 80 nt windows were combined to make the structural models. The range of proportions of each cluster come from the individual windows of n=1 experiment. Nucleotides were color coded with normalized DMS signal.
Extended Data Figure 10 |
Extended Data Figure 10 |. Proportion of minor clusters across the HIV-1 genome and U1, U4/6 core-domain structural models.
a, Each bar shows the proportion of a minor cluster of an 80 nt window as a function of genome position for regions in the HIV-1NHG genome data set that are covered by at least 100,000 reads and pass 2 clusters according to the Bayesian Information Criterion test. b, U1 structural prediction from HEK293t cells transfected with HIV-1NHG. Abundance of cluster obtained from DREEM clustering. c, In vitro DMS-modified U4/6 core-domain RNA. Structure shown for population average, cluster 2 did not pass BIC. d, The left panel shows the difference in BIC test value between K=2 and K=1, normalized to the value for K=2 for the real whole-genome dataset. Each bar represents an 80 nt window across the HIV-1 genome. In orange are windows where only 1 cluster was detected according to the BIC test and in blue are clusters for which 2 clusters passed the BIC test. The right panel shows the same plot from simulated data for which the mutations were randomly distributed but had the same average number of mutations per read as the true data.
Extended Data Figure 11|
Extended Data Figure 11|. Shannon entropy across the HIV-1 genome and A4/5.
a, Overlay of the HIV-1NHG genomic organization on top of Shannon entropy plot. Each dot represents an 80 nt window in which Shannon entropy was calculated from DMS reactivity. The top plot is the major cluster and the bottom is the minor cluster. b, Scatter plot of Gini index versus Shannon entropy for the major and minor clusters (n=1). R2 is Pearson’s R2. c, Structural model of the TAR stem-loop from the genome-wide DMS-MaPseq and DREEM data. d, Structural model from 2 clusters found using the genome-wide DMS-MaPseq and DREEM data for a window containing 4 splice acceptor sites- A4a-c and A5. Splice sites are boxed. Nucleotides are color coded with normalized DMS signal.
Figure 1 |
Figure 1 |. Development and validation of DREEM algorithm for analysis of alternative RNA structures
a, Schematic of combining DMS-MaPseq data with DREEM algorithm to detect alternative RNA structure. b, Structural model of in vitro transcribed and folded Structure 1 and Structure 2 as determined by DMS-MaPseq. Nucleotides are color coded by normalized DMS signal. c, DMS mutational fraction per nucleotide and quantification of Structure 1 and Structure 2 determined by DREEM clustering for a mixing ratio of 25% (Structure 1) to 75% (Structure2) prior to DMS-modification. d, Proportion of Structure 1 and Structure 2 measured by DREEM clustering after in vitro transcription, mixing, and DMS-MaPseq. The expected (E) and observed (O) ratios are shown from an n=1 experiment for each mixing proportion.
Figure 2 |
Figure 2 |. Formation of alternate structures at HIV-1 RRE is driven by intrinsic RNA thermodynamics
a, HIV-1 RRE structural models derived from DMS-MaPseq followed by DREEM using in vitro transcribed structure locked RRE 5-stem (MutA) and 4-stem (MutB) mutants. Bar graphs represent expected (E) and observed (O) mixing ratios of 4-stem and 5-stem structures from an n=2 experiment. b, Normalized DMS signal for RRE 5-stem and 4-stem structures observed in vitro, in virion and in vivo from CD4+ T cells infected with HIV-1NL4–3 identified by DREEM clustering. The positions highlighted are examples of bases that change pairing state between the two structures, shown in both the DMS signal and the folded RNA structures of Stem 4/5. Percentages for each cluster are determined by DREEM from representative samples n=2 for in vivo and in vitro, n=1 for virion. c, Scatter plots of clustering results for n=2 biological replicates (top two plots) and the variation in DMS signal between two different clusters (4-stem vs 5-stem, bottom plot).
Figure 3 |
Figure 3 |. Alternative RNA structures at the A3 splice acceptor site regulate splice usage
a, Structural models of the A3ss from CD4+ T cells infected with HIV-1NL4–3 made from clustering outputs of DREEM. Proportions of each cluster are a range from n=4 experiments (1 HIV-1NL4–3 and 3 HIV-1NHG). Nucleotides are color coded by normalized DMS signal. The splice site is highlighted in a blue box, a region that base-pairs to the splice site is in green. b) Scatter plots comparing alternative structures between CD4+ T cells infected with HIV-1NL4–3 and HEK293t cells transfected with HIV-1NHG in an n=2 experiment. The blue dotted line is the identity line; R2 is Pearson’s R2. c, Mutant design; all mutants were predicted to thermodynamically stabilize the A3SL. d, Splice usage fold change compared to ΔvprHIV-1NHG (n=4 for A3SLMut1,3; n=3 for A3SLMut2). The left panel represents splice usage of singly spliced transcripts and the right panel represents multiply spliced transcripts, reported as points with mean and error bars representing s.d.
Figure 4 |
Figure 4 |. HIV-1 RNA structure heterogeneity landscape
HIV-1 genome organization highlighting the UTR, coding regions, major splice donor and acceptor sites and RRE overlaid on structural variability plot for the library generated from HEK293t cells transfected with HIV-1NHG. Each dot represents an 80 nt window of DMS-MaPseq data used for DREEM with a maximum of 2 clusters in an n=1 experiment. The cluster for each window with a higher Gini coefficient is plotted on top in orange and the cluster with a lower Gini coefficient is plotted on bottom in blue. A heat-map comparing the Pearson’s R2 for the 2 clusters is below the Gini coefficient. Windows without sufficient coverage for clustering (<100,000 reads) are in grey. Windows that did not pass the BIC test for more than 1 cluster are in white. A Pearson’s R2 value measures the similarity in the DMS signal between each pair of clusters identified by DREEM. In dark red are most divergent clusters with R2<0.3; orange is 0.3≥ R2.

References

[References]

    1. Purcell DF & Martin MA Alternative splicing of human immunodeficiency virus type 1 mRNA modulates viral protein expression, replication, and infectivity. J Virol 67, 6365–6378 (1993). - PMC - PubMed
    1. Ocwieja KE et al. Dynamic regulation of HIV-1 mRNA populations analyzed by single-molecule enrichment and long-read sequencing. Nucleic Acids Res 40, 10345–10355, doi:10.1093/nar/gks753 (2012). - DOI - PMC - PubMed
    1. Takata MA et al. Global synonymous mutagenesis identifies cis-acting RNA elements that regulate HIV-1 splicing and replication. PLoS Pathog 14, e1006824, doi:10.1371/journal.ppat.1006824 (2018). - DOI - PMC - PubMed
    1. Watts JM et al. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature 460, 711–716, doi:10.1038/nature08237 (2009). - DOI - PMC - PubMed
    1. Warf MB & Berglund JA Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem Sci 35, 169–178, doi:10.1016/j.tibs.2009.10.004 (2010). - DOI - PMC - PubMed

[Methods References]

    1. Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, doi:10.1038/nmeth.1923 (2012). - DOI - PMC - PubMed
    1. Darty K, Denise A & Ponty Y VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 25, 1974–1975, doi:10.1093/bioinformatics/btp250 (2009). - DOI - PMC - PubMed
    1. Adachi A et al. Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone. J Virol 59, 284–291 (1986). - PMC - PubMed
    1. Lahm HW & Stein S Characterization of recombinant human IL-2 with micromethods. Journal of Chromatography 326, 357–361 (1985). - PubMed

Publication types