Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 29;12(8):jkac142.
doi: 10.1093/g3journal/jkac142.

Integrative analysis and prediction of human R-loop binding proteins

Affiliations

Integrative analysis and prediction of human R-loop binding proteins

Arun Kumar et al. G3 (Bethesda). .

Abstract

In the past decade, there has been a growing appreciation for R-loop structures as important regulators of the epigenome, telomere maintenance, DNA repair, and replication. Given these numerous functions, dozens, or potentially hundreds, of proteins could serve as direct or indirect regulators of R-loop writing, reading, and erasing. In order to understand common properties shared amongst potential R-loop binding proteins, we mined published proteomic studies and distilled 10 features that were enriched in R-loop binding proteins compared with the rest of the proteome. Applying an easy-ensemble machine learning approach, we used these R-loop binding protein-specific features along with their amino acid composition to create random forest classifiers that predict the likelihood of a protein to bind to R-loops. Known R-loop regulating pathways such as splicing, DNA damage repair and chromatin remodeling are highly enriched in our datasets, and we validate 2 new R-loop binding proteins LIG1 and FXR1 in human cells. Together these datasets provide a reference to pursue analyses of novel R-loop regulatory proteins.

Keywords: R-loop binding proteins; R-loops; random forest.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Higher-order features of RLBPs. a) Venn diagram of candidate RLBPs identified in published proteomic studies. b) Enrichment analysis of PFAM domains enriched within the IP-MS and Prox-MS overlaps (False discovery rate adjusted P-value <0.01). c) Comparison of domain numbers between the whole proteome, nuclear proteome, and respective RLBP datasets. d) The proportion of proteins in each category encoding a nucleic acid binding domain. For (c) and (d), ****P < 0.0001, **P < 0.01,*P < 0.1, nsP > 0.1 Fisher’s exact test.
Fig. 2.
Fig. 2.
Amino acid sequence characteristics of RLBPs. a) Hydrophobicity and hydrophilicity (GRAVY); b) aliphatic index; c) percentage of charged residues; d) solubility; e) protein disorder; and f) percentage of LCRs were compared for the indicated whole proteome, nuclear proteome, and the respective RLBP datasets. The exact P-values resulting from a Mann–Whitney–Wilcoxon test after Bonferroni correction for multiple comparisons are reported.
Fig. 3.
Fig. 3.
An RF classifier to determine R-loop binding property likelihood. a) Schematic of the machine learning model pipeline (created with BioRender). b, c) Receiving operating characteristics of the training (blue) and testing (green) sets with included precision recall-gain curves for IP-MS RF and Prox-MS RF models respectively. d, e) Whole proteome prediction of RLBP character by both models. The plot shows the probability distribution of the whole proteome and our IP-MS or Prox-MS RLBP training set. The newly predicted RLBPs extracted from the whole proteome are highlighted in the box on the right.
Fig. 4.
Fig. 4.
Common characteristics of predicted RLBPs. Venn diagrams showing the overlap between our Machine Learning algorithms and published IP-MS (a) and Prox-MS (b) datasets. c) Venn diagram showing the overlapping proteins predicted by our IP-MS and Prox-MS RF algorithms. Top 10 enriched GO terms, biological processes (d) and PFAM domains (e), identified using David v 6.8 (P < 0.05, with FDR correction) among IP-MS RF hits (n = 377, green), overlapping hits (n = 288, yellow), and Prox-MS RF hits (n = 200, red).
Fig. 5.
Fig. 5.
Integrative analysis of enriched complexes involved in R-loop binding. a) Functional interaction network of the combined IP-MS RF and Prox-MS RF predicted RLBPs. Genes were manually annotated based on the literature and corresponding GO terms (biological processes and molecular functions). Edges between nodes indicate physical interactions between proteins (Cytoscape, Genemania).
Fig. 6.
Fig. 6.
Validation of novel candidate R-loop modulatory proteins. Left: quantification of nuclear PLA foci showing localization of LIG1 (a) or FXR1 (b) at sites of S9.6 staining. Right: representative images (1-way ANOVA, N = 3, >300 nuclei analyzed). Scale bars = 20 µm. See minus primary antibody controls in Supplementary Fig. 3b. c) Quantification of nuclear S9.6 immunofluorescence signal in siRNA knockdown of LIG1 and FXR1. Values are normalized to siCTRL. (mean ± SD, 1-way ANOVA, N = 3, number of nuclei analyzed is presented under each distribution). Representative images can be found in Supplementary Fig. 3c. d) Quantification of nuclear S9.6 signal from immunofluorescence showing inhibition of LIG1 with L82-G17 (25 µM, 4 h) results in increased nuclear S9.6 staining (mean ± SD, 1-way ANOVA, N = 3, number of nuclei analyzed is presented under each distribution). Representative images can be found in Supplementary Fig. 3d. e) Relative DRIP-qPCR signal values at LRP1B, SNRPN, WWOX, APOE, and CDH13 genes in HeLa cells transfected with indicated siRNAs and treated with in vitro RNaseH preimmunoprecipitation where indicated (mean ± SEM, 1-way ANOVA, at least 3 independent replicates). Statistically significant values shown in bold.

References

    1. Aguilera A, García-Muse T.. R loops: from transcription byproducts to threats to genome stability. Mol Cell. 2012;46(2):115–124. doi:10.1016/j.molcel.2012.04.009. - DOI - PubMed
    1. Alberti S, Gladfelter A, Mittag T.. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell. 2019;176(3):419–434. doi:10.1016/j.cell.2018.12.035. - PMC - PubMed
    1. Barroso S, Herrera-Moyano E, Muñoz S, García-Rubio M, Gómez-González B, Aguilera A.. The DNA damage response acts as a safeguard against harmful DNA-RNA hybrids of different origins. EMBO Rep. 2019;20(9):e47250. doi:10.15252/embr.201847250. - DOI - PMC - PubMed
    1. Bayona-Feliu A, Barroso S, Muñoz S, Aguilera A.. The SWI/SNF chromatin remodeling complex helps resolve R-loop-mediated transcription-replication conflicts. Nat Genet. 2021;53(7):1050–1063. doi:10.1038/s41588-021-00867-2. - DOI - PubMed
    1. Beckmann BM, Horos R, Fischer B, Castello A, Eichelbaum K, Alleaume A-M, Schwarzl T, Curk T, Foehr S, Huber W, et al.The RNA-binding proteomes from yeast to man harbour conserved EnigmRBPs. Nat Commun. 2015;6:10127. doi:10.1038/ncomms10127. - PMC - PubMed

Publication types

Grants and funding

LinkOut - more resources