Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;42(7):4160-79.
doi: 10.1093/nar/gkt1414. Epub 2014 Jan 23.

The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification

Affiliations

The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification

Karolina A Majorek et al. Nucleic Acids Res. 2014 Apr.

Abstract

Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Two-dimensional projection of the CLANS clustering results obtained for the sequences of the RNHL domains of the superfamily proteins. Sequences are indicated by dots. Lines indicate sequence similarity detectable with BLAST and are colored by a spectrum of shades of gray according to the BLAST P-value. To facilitate distinction of separate clusters they are indicated with different colors matching the colors of their labels. Clusters were labeled with the clade they grouped into in the evolutionary analysis (indicated with letters A–F and Roman numerals I–VI; see section Phylogenetic analysis). For each clade the clusters were numbered based on their size in descending order (e.g. cluster A.1. will comprise more sequences than the cluster A.8.). The number is followed by a common name of the most distinguished member of the cluster and identifiers of Pfam, COG and KOG families, members of which were found in the cluster.
Figure 2.
Figure 2.
Topological diagrams of representative RNHL superfamily structures. α-helices are shown as circles, β-strands are shown as triangles. Orientation of the triangles shows the orientation of the β-strands, with vertex of the triangle pointing up for the β-strand pointing toward the reader, and vertex of the triangle pointing down for opposite orientation of the β-strand. Universally conserved elements are shown in gray, variable elements are in white.
Figure 3.
Figure 3.
Structure-based multiple sequence alignment of the core of the RNase H-like domain of the 41 structural representatives. Sequences are denoted by a letter indicating their clade (described in this work), PDB code, six-letter abbreviation for genus and species and a common name, abbreviated in most cases. Residues are colored by physicochemical properties of their side chains, and background-colored positions are those showing at least 40% identity/similarity. The variable termini and insertions are not shown; the number of omitted residues is indicated in parentheses. Secondary structure elements determined for selected representative are shown above the alignment, where α-helices are represented by tubes, β-strands by arrows and loops by continuous lines.
Figure 4.
Figure 4.
Configuration of the active site. (A) Known and predicted catalytic residues in exemplary RNHL families with respect to the secondary structure of the catalytic core. Catalytic residues are shown in white. Conserved residues important for the activity but not directly involved in catalysis are shown in gray. (B) Superposition of the active sites of RNase H1 (D132N mutant) from B. halodurans (PDB code: 1zbi; shown in green) and RNase T from E. coli (PDB code: 3nh1; shown in magenta), with metal ions and substrate nucleic acids bound. (C) Active site of RNase H1 from B. halodurans (in the same orientation as in panel B). (D) Active site of RNase T from E. coli (in the same orientation as in panel B).
Figure 5.
Figure 5.
Sequence alignment of the exemplary C-terminal α-helices of the catalytic core and comparison of its two alternative orientations. Sequences are denoted by the PDB code. Position colored in red and blue indicates position of the last catalytic amino acid. Direction of the α-helix from N- to C-terminus is indicated by arrows.
Figure 6.
Figure 6.
Evolutionary tree of the RNHL superfamily. (A) Clustering of RNHL structures based on DALI Z-scores. Clusters corresponding to 3′–5′ exonucleases and endonucleases with reversed C-terminal helix are outlined with cyan and magenta boxes, respectively. Remaining endonucleases are outlined with an orange box. (B) Evolutionary tree of the RNHL representatives calculated based on multiple sequence alignment, profile-profile comparisons and conservation of the catalytic residues. Main clades are indicated with letters A–F and Roman numerals I–VI. Numbers below branches indicate posterior probabilities and only those higher than 0.5 are shown.

Similar articles

Cited by

References

    1. Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, Ikehara M, Matsuzaki T, Morikawa K. Three- dimensional structure of ribonuclease H from E. coli. Nature. 1990;347:306–309. - PubMed
    1. Yang W, Hendrickson WA, Crouch RJ, Satow Y. Structure of ribonuclease H phased at 2 A resolution by MAD analysis of the selenomethionyl protein. Science. 1990;249:1398–1405. - PubMed
    1. Rice PA, Baker TA. Comparative architecture of transposase and integrase complexes. Nat. Struct. Biol. 2001;8:302–307. - PubMed
    1. Ariyoshi M, Vassylyev DG, Iwasaki H, Nakamura H, Shinagawa H, Morikawa K. Atomic structure of the RuvC resolvase: a holliday junction-specific endonuclease from E. coli. Cell. 1994;78:1063–1072. - PubMed
    1. Parker JS, Roe SM, Barford D. Crystal structure of a PIWI protein suggests mechanisms for siRNA recognition and slicer activity. EMBO J. 2004;23:4727–4737. - PMC - PubMed

Publication types