. 2022 Aug 10;38(16):3900-3910.

doi: 10.1093/bioinformatics/btac421.

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Jaswinder Singh¹, Kuldip Paliwal¹, Thomas Litfin², Jaspreet Singh¹, Yaoqi Zhou^{2

3

4}

Affiliations

¹ Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia.
² Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.
³ Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.
⁴ Peking University Shenzhen Graduate School, Peking University, Shenzhen 518055, China.

PMID: 35751593
PMCID: PMC9364379
DOI: 10.1093/bioinformatics/btac421

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Jaswinder Singh et al. Bioinformatics. 2022.

. 2022 Aug 10;38(16):3900-3910.

doi: 10.1093/bioinformatics/btac421.

Authors

Jaswinder Singh¹, Kuldip Paliwal¹, Thomas Litfin², Jaspreet Singh¹, Yaoqi Zhou^{2

3

4}

Affiliations

¹ Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia.
² Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.
³ Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China.
⁴ Peking University Shenzhen Graduate School, Peking University, Shenzhen 518055, China.

PMID: 35751593
PMCID: PMC9364379
DOI: 10.1093/bioinformatics/btac421

Abstract

Motivation: Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins.

Results: Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction.

Availability and implementation: SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
(A) Inputted 1D and 2D features used by the SPOT-RNA-2D; where MSA is multiple sequence alignment, CSS is predicted consensus secondary structure from RNAfold (MFE), CM is covariance model, One-hot is the one-hot encoding of the input sequence, PSSM is the position-specific scoring matrix, PLMC is pseudo-likelihood maximization coupling, BPs is predicted base-pairs probability from RNAfold (MFE), MFE is minimum free energy and L is the length of the RNA sequence. (B) The generalized deep neural network architecture of SPOT-RNA-2D

**Fig. 2.**
Precision–recall (PR) curves given by SPOT-RNA-2D and SPOT-RNA-2D-Single (A) along with four DCA predictors on 147 RNAs from three test sets TS1, TS2 and TS3, (B) further comparison with RNAContact on 82 RNAs from three reduced test sets TS1, TS2 and TS3 after removing the sequences overlapping with RNAContact training data

**Fig. 3.**
Mean precision of long-range contacts ( $i - j \geq 24$ ) given by various methods as labelled (A) on full test sets TS1, TS2, TS3, RNA-Puzzles and TS80, (B) on reduced test sets TS1, TS2, TS3, RNA-Puzzles and TS80 after removing the sequences overlapping with RNAContact training data

**Fig. 4.**
Mean precision of top L long-range contacts as a function of the number of effective homologous sequences N_eff-value on (A) the combined full test sets TS1, TS2 and TS3 (B) the combined reduced test sets TS1, TS2 and TS3 after removing the sequences overlapping with RNAContact training data

**Fig. 5.**
Comparison of predicted contact maps given by RNAContact (A, D, G), SPOT-RNA-2D-Single (B, E, H) and SPOT-RNA-2D (C, F, I) predicted contact map (in the lower triangle) with native contact map (in the upper triangle) for 2’-dG-II riboswitch (Chain A in PDB ID 6p2h), Varkud satellite ribozyme (Chain A in PDB ID 4r4v) and Hatchet Ribozyme (Chain A in PDB ID 6jq5) from RNA-Puzzles test set. Color bar indicates probability of predicted distance-based contact map in lower triangle. Highlighted orange circles indicate correctly predicted long-range contacts. Cartoon Figures indicate corresponding native 3D structure of upper triangular matrix on the left with long-range contacts highlighted by color orange and remaining contacts in color red (A color version of this figure appears in the online version of this article.)

See this image and copyright information in PMC

References

1. Altschul S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
1. Ba J.L. et al. (2016) Layer normalization. Preprint arXiv, 1607.06450.
1. Baek M. et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373, 871–876. - PMC - PubMed
1. Balakrishnan S. et al. (2011) Learning generative models for protein fold families. Proteins Struct. Funct. Bioinform., 79, 1061–1078. - PubMed
1. Cai Z. et al. (2020) RIC-seq for global in situ profiling of RNA–RNA spatial interactions. Nature, 582, 432–437. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

DP210101875/Australian Research Council

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Affiliations

Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous