. 2022 Feb 24;18(2):e1009863.

doi: 10.1371/journal.pcbi.1009863. eCollection 2022 Feb.

Inferring RNA-binding protein target preferences using adversarial domain adaptation

Ying Liu^{1

2}, Ruihui Li³, Jiawei Luo¹, Zhaolei Zhang^{2

4

5}

Affiliations

¹ College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.
² Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
³ Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
⁴ Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
⁵ Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

PMID: 35202389
PMCID: PMC8870515
DOI: 10.1371/journal.pcbi.1009863

Inferring RNA-binding protein target preferences using adversarial domain adaptation

Ying Liu et al. PLoS Comput Biol. 2022.

. 2022 Feb 24;18(2):e1009863.

doi: 10.1371/journal.pcbi.1009863. eCollection 2022 Feb.

Authors

Ying Liu^{1

2}, Ruihui Li³, Jiawei Luo¹, Zhaolei Zhang^{2

4

5}

Affiliations

¹ College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.
² Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
³ Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
⁴ Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
⁵ Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

PMID: 35202389
PMCID: PMC8870515
DOI: 10.1371/journal.pcbi.1009863

Abstract

Precise identification of target sites of RNA-binding proteins (RBP) is important to understand their biochemical and cellular functions. A large amount of experimental data is generated by in vivo and in vitro approaches. The binding preferences determined from these platforms share similar patterns but there are discernable differences between these datasets. Computational methods trained on one dataset do not always work well on another dataset. To address this problem which resembles the classic "domain shift" in deep learning, we adopted the adversarial domain adaptation (ADDA) technique and developed a framework (RBP-ADDA) that can extract RBP binding preferences from an integration of in vivo and vitro datasets. Compared with conventional methods, ADDA has the advantage of working with two input datasets, as it trains the initial neural network for each dataset individually, projects the two datasets onto a feature space, and uses an adversarial framework to derive an optimal network that achieves an optimal discriminative predictive power. In the first step, for each RBP, we include only the in vitro data to pre-train a source network and a task predictor. Next, for the same RBP, we initiate the target network by using the source network and use adversarial domain adaptation to update the target network using both in vitro and in vivo data. These two steps help leverage the in vitro data to improve the prediction on in vivo data, which is typically challenging with a lower signal-to-noise ratio. Finally, to further take the advantage of the fused source and target data, we fine-tune the task predictor using both data. We showed that RBP-ADDA achieved better performance in modeling in vivo RBP binding data than other existing methods as judged by Pearson correlations. It also improved predictive performance on in vitro datasets. We further applied augmentation operations on RBPs with less in vivo data to expand the input data and showed that it can improve prediction performances. Lastly, we explored the predictive interpretability of RBP-ADDA, where we quantified the contribution of the input features by Integrated Gradients and identified nucleotide positions that are important for RBP recognition.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Flowchart of the RBP-ADDA method.**
During Data Encoding, each sequence in the sample (in vitro and in vivo) is represented as a concatenation of a one-hot encoding vector representing the nucleotides. **Step 1**. Pre-training. We use in vitro data to pre-train a source network and task predictor. **Step 2.1.** Initialize the target network. Target network is initialized by sharing the same parameters and architecture with source network. **Step 2.2**. ADDA. We apply adversarial learning to train the target network on in vivo data and train the domain discriminator. **Step 3**. Fine-tuning. We use both the source and target network to fine-tune the task predictor. Solid lines indicate steps in which the network parameters are fixed.

**Fig 2. Comparison of performances between RBP-ADDA and other methods.**
**(A)** Comparison on 25 in vitro RNAcompete datasets; **(B)** Comparison on 19 eCLIP datasets from HepG2 cell line; **(C)** Comparison on 19 eCLIP datasets from K562 cell line. P-values are computed using unpaired Wilcoxon rank sum one-tailed test with p.adjust.

**Fig 3. Performance of RBP-ADDA model after data augmentation operations.**
In each panel, the predictive performances on an RBP are grouped and shown as norm (non-augment), gap, replacement, and swap. Within each group, the performances after pre-training step, domain adaptation step and fine-tuning step are indicated as “1”, “2” and “3”.

**Fig 4. Visualization of attribution scores, consensus motif, motifs obtained from in vitro (RNAcompete) and in vivo (eCLIP) experiments.**

See this image and copyright information in PMC

Cited by

Emerging RNA-centric technologies to probe RNA-protein interactions: importance in decoding the life cycle of positive sense single strand RNA viruses and antiviral discovery.
Ghosh S, Kumar S, Verma R, Ansari S, Chatterjee S, Surjit M. Ghosh S, et al. Front Cell Infect Microbiol. 2025 May 21;15:1580337. doi: 10.3389/fcimb.2025.1580337. eCollection 2025. Front Cell Infect Microbiol. 2025. PMID: 40584171 Free PMC article. Review.
A systematic benchmark of machine learning methods for protein-RNA interaction prediction.
Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. Horlacher M, et al. Brief Bioinform. 2023 Sep 20;24(5):bbad307. doi: 10.1093/bib/bbad307. Brief Bioinform. 2023. PMID: 37635383 Free PMC article.

References

1. Gerstberger S, Hafner M, Tuschl T (2014) A census of human RNA-binding proteins. Nature Reviews Genetics 15: 829–845. doi: 10.1038/nrg3813 - DOI - PMC - PubMed
1. Cooper TA, Wan L, Dreyfuss G (2009) RNA and disease. Cell 136: 777–793. doi: 10.1016/j.cell.2009.02.011 - DOI - PMC - PubMed
1. Siddiqui N, Borden KL (2012) mRNA export and cancer. Wiley Interdiscip Rev RNA 3: 13–25. doi: 10.1002/wrna.101 - DOI - PubMed
1. König J, Zarnack K, Luscombe NM, Ule J (2012) Protein–RNA interactions: new genomic technologies and perspectives. Nature Publishing Group 13: 77–83. doi: 10.1038/nrg3141 - DOI - PubMed
1. Darnell RB (2010) HITS-CLIP: panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 1: 266–286. doi: 10.1002/wrna.31 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inferring RNA-binding protein target preferences using adversarial domain adaptation

Affiliations

Inferring RNA-binding protein target preferences using adversarial domain adaptation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources