Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 10;38(16):3900-3910.
doi: 10.1093/bioinformatics/btac421.

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Affiliations

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Jaswinder Singh et al. Bioinformatics. .

Abstract

Motivation: Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins.

Results: Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction.

Availability and implementation: SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) Inputted 1D and 2D features used by the SPOT-RNA-2D; where MSA is multiple sequence alignment, CSS is predicted consensus secondary structure from RNAfold (MFE), CM is covariance model, One-hot is the one-hot encoding of the input sequence, PSSM is the position-specific scoring matrix, PLMC is pseudo-likelihood maximization coupling, BPs is predicted base-pairs probability from RNAfold (MFE), MFE is minimum free energy and L is the length of the RNA sequence. (B) The generalized deep neural network architecture of SPOT-RNA-2D
Fig. 2.
Fig. 2.
Precision–recall (PR) curves given by SPOT-RNA-2D and SPOT-RNA-2D-Single (A) along with four DCA predictors on 147 RNAs from three test sets TS1, TS2 and TS3, (B) further comparison with RNAContact on 82 RNAs from three reduced test sets TS1, TS2 and TS3 after removing the sequences overlapping with RNAContact training data
Fig. 3.
Fig. 3.
Mean precision of long-range contacts (ij24) given by various methods as labelled (A) on full test sets TS1, TS2, TS3, RNA-Puzzles and TS80, (B) on reduced test sets TS1, TS2, TS3, RNA-Puzzles and TS80 after removing the sequences overlapping with RNAContact training data
Fig. 4.
Fig. 4.
Mean precision of top L long-range contacts as a function of the number of effective homologous sequences Neff-value on (A) the combined full test sets TS1, TS2 and TS3 (B) the combined reduced test sets TS1, TS2 and TS3 after removing the sequences overlapping with RNAContact training data
Fig. 5.
Fig. 5.
Comparison of predicted contact maps given by RNAContact (A, D, G), SPOT-RNA-2D-Single (B, E, H) and SPOT-RNA-2D (C, F, I) predicted contact map (in the lower triangle) with native contact map (in the upper triangle) for 2’-dG-II riboswitch (Chain A in PDB ID 6p2h), Varkud satellite ribozyme (Chain A in PDB ID 4r4v) and Hatchet Ribozyme (Chain A in PDB ID 6jq5) from RNA-Puzzles test set. Color bar indicates probability of predicted distance-based contact map in lower triangle. Highlighted orange circles indicate correctly predicted long-range contacts. Cartoon Figures indicate corresponding native 3D structure of upper triangular matrix on the left with long-range contacts highlighted by color orange and remaining contacts in color red (A color version of this figure appears in the online version of this article.)

Similar articles

Cited by

References

    1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
    1. Ba J.L. et al. (2016) Layer normalization. Preprint arXiv, 1607.06450.
    1. Baek M. et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373, 871–876. - PMC - PubMed
    1. Balakrishnan S. et al. (2011) Learning generative models for protein fold families. Proteins Struct. Funct. Bioinform., 79, 1061–1078. - PubMed
    1. Cai Z. et al. (2020) RIC-seq for global in situ profiling of RNA–RNA spatial interactions. Nature, 582, 432–437. - PubMed

Publication types