Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 14;18(12):e1010777.
doi: 10.1371/journal.pcbi.1010777. eCollection 2022 Dec.

Sequence-sensitive elastic network captures dynamical features necessary for miR-125a maturation

Affiliations

Sequence-sensitive elastic network captures dynamical features necessary for miR-125a maturation

Olivier Mailhot et al. PLoS Comput Biol. .

Abstract

The Elastic Network Contact Model (ENCoM) is a coarse-grained normal mode analysis (NMA) model unique in its all-atom sensitivity to the sequence of the studied macromolecule and thus to the effect of mutations. We adapted ENCoM to simulate the dynamics of ribonucleic acid (RNA) molecules, benchmarked its performance against other popular NMA models and used it to study the 3D structural dynamics of human microRNA miR-125a, leveraging high-throughput experimental maturation efficiency data of over 26 000 sequence variants. We also introduce a novel way of using dynamical information from NMA to train multivariate linear regression models, with the purpose of highlighting the most salient contributions of dynamics to function. ENCoM has a similar performance profile on RNA than on proteins when compared to the Anisotropic Network Model (ANM), the most widely used coarse-grained NMA model; it has the advantage on predicting large-scale motions while ANM performs better on B-factors prediction. A stringent benchmark from the miR-125a maturation dataset, in which the training set contains no sequence information in common with the testing set, reveals that ENCoM is the only tested model able to capture signal beyond the sequence. This ability translates to better predictive power on a second benchmark in which sequence features are shared between the train and test sets. When training the linear regression model using all available data, the dynamical features identified as necessary for miR-125a maturation point to known patterns but also offer new insights into the biogenesis of microRNAs. Our novel approach combining NMA with multivariate linear regression is generalizable to any macromolecule for which relatively high-throughput mutational data is available.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Assignment of beads on the four standard nucleotides and strength of base-base interactions in the different models.
A-B) Phosphate atoms are in gold, C1’ carbons from the sugar group in light green and C2 carbons from the nucleobase in dark green. A GC pair is shown in A) and an AU pair in B). The atom names are shown over white structures to enhance legibility. The assignation of the ENCoM atom types to each atom name is given in Table H in S1 Text. All three beads are included in the ENMs tested for the present work. The base pairs shown were extracted from an A-form RNA helix generated using the MC-Fold and MC-Sym pipeline [36]. C) A stack of canonical base pairs, AU over GC, extracted from PDB code 2V6W. ENCoM’s βij term, Cut-ANM’s binary interaction and PD-ANM’s distance dependent interaction are shown in panel D) between all pairs of nucleobase beads, from the exact structure shown in C). The parameters used are 10 Å cutoff for Cut-ANM and a power dependency of 7 for PD-ANM, the optimal values found in the present study for the B-factors benchmark. The values are rescaled to a maximum of 100 to allow better comparison of the three potentials. The cytosine is in close proximity to the diagonal adenosine, hence the high interaction value assigned by PD-ANM, proportional to the inverse of the distance to the 7th power.
Fig 2
Fig 2. miR-125a 2D MFE structure, mutation boxes, hard benchmark sets and 3D structure.
A) The 14 boxes that were each exhaustively mutated in the Bartel study, shown on the MC-Fold predicted WT MFE structure. B) Proportions of sequence variants containing each position which adopt the WT MFE structure. C) Positions in the hard benchmark testing set shown in red, and positions in the training set in blue. D) The medoid 3D miR-125a structure from the 67 structures predicted by MC-Sym.
Fig 3
Fig 3. Pearson correlation coefficient between predicted and experimental B-factors for ENCoM and ANM.
A) The mean Pearson correlation is shown as a function of the β scaling factor for the three ENMs. B) The Entropic Signature using the best scaling factor for each model is compared to the mean square fluctuations (MSF). The p-values from paired Wilcoxon signed-rank tests are shown for pairs of the two types of predictions for every model and for every pair of models, only for the Entropic Signature predictions since they outperform the MSF.
Fig 4
Fig 4. Cumulative overlap between normal modes and conformational changes from X-ray crystallography experiments.
A) Mean cumulative overlap as a function of the proportion of the nontrivial normal modes used. B) Mean cumulative overlap at 5% nontrivial normal modes used, with p-values from paired Wilcoxon signed-rank tests for every pair of models. C) Mean cumulative overlap at 5% nontrivial modes, as a function of the conformational change RMSD, in bins of 0.1 RMSD. D) An illustrative example from the high RMSD pairs, PDB codes 2B8R (red) and 3FAR (green), with 22.8 RMSD. One strand from each structure is colored paler than the other.
Fig 5
Fig 5. RMSIP and NCO between normal modes and principal components from NMR ensembles.
A) Root mean square inner product (RMSIP) between a proportion of the nontrivial normal modes and the principal components accounting for at least 99% of the variance apparent within the NMR ensemble. B) RMSIP at 5% nontrivial normal modes, with p-values from paired Wilcoxon signed-rank tests for every pair of models. C) Normalized cumulative overlap (NCO) between a proportion of the nontrivial normal modes and the principal components accounting for at least 99% of the variance apparent within the NMR ensemble. D) NCO at 5% nontrivial normal modes, with p-values from paired Wilcoxon signed-rank tests for every pair of models.
Fig 6
Fig 6. Performance of LASSO linear regression on the hard benchmark.
A) Predictive R2 for each model alone or in combination with the MC-Fold enthalpy of folding. For the three ENMs, scaling factors for the dynamical signature were explored around the value which gave the best respective performance in the B-factors benchmark. Predictive R2 values below zero are shown in gray. B) Performance on the test set for the best combination of parameters for every model alone or in tandem with MC-Fold. The bottom row shows the p-value from a simulation combining gaussian noise with the MC-Fold prediction, which corresponds to the probability of these predictive R2 values arising by chance from a pure noise model. The dotted red lines show x = y.
Fig 7
Fig 7. Performance of LASSO linear regression on the inverted benchmark.
A) Predictive R2 for each model alone or in combination with the MC-Fold enthalpy of folding, as in Fig 6. B) Performance on the test set for the best combination of parameters for every model alone or in tandem with MC-Fold.
Fig 8
Fig 8. Coefficients of LASSO regression models.
A) Coefficients of the combined ENCoM and MC-Fold model trained on the hard training set with β = e2.25 and λ = 2−4. B) ENCoM Entropic Signature coefficients mapped on the 2D structure of pri-miR-125a. The DROSHA cut site and mismatched GHG motif are identified on the structures. C) Training metrics from training the ENCoM-MC-Fold combination at β = e2.25 on the whole set of 26 960 sequence variants selected for this study. Pearson’s R, R2 and the square of Pearson’s R are plotted in relation to regularization strength. A red vertical line shows the last regularization strength before R2 starts to diverge from the square of Pearson’s R. D) Pearson’s R and R2 training performance expressed as a proportion of maximal performance, as a function of LASSO regularization strength. E)-F) Same as A)-B), with the LASSO model trained on all 26 960 sequences and λ = 2−8.

References

    1. Strobel EJ, Watters KE, Loughrey D, Lucks JB. RNA systems biology: uniting functional discoveries and structural tools to understand global roles of RNAs. Current Opinion in Biotechnology. 2016;39:182–191. doi: 10.1016/j.copbio.2016.03.019 - DOI - PMC - PubMed
    1. Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Research. 2014;43(D1):D345–D356. doi: 10.1093/nar/gku1214 - DOI - PMC - PubMed
    1. Al-Hashimi HM, Walter NG. RNA dynamics: it is about time. Current Opinion in Structural Biology. 2008;18(3):321–329. doi: 10.1016/j.sbi.2008.04.004 - DOI - PMC - PubMed
    1. Dallaire P, Tan H, Szulwach K, Ma C, Jin P, Major F. Structural dynamics control the MicroRNA maturation pathway. Nucleic Acids Res. 2016; p. gkw793. doi: 10.1093/nar/gkw793 - DOI - PMC - PubMed
    1. Bahar I, Lezon TR, Yang LW, Eyal E. Global Dynamics of Proteins: Bridging Between Structure and Function. Annu Rev Biophys. 2010;39(1):23–42. doi: 10.1146/annurev.biophys.093008.131258 - DOI - PMC - PubMed

Publication types

Grants and funding