Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jan 15;30(2):482-96.
doi: 10.1093/nar/30.2.482.

A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis

Affiliations

A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis

Kira S Makarova et al. Nucleic Acids Res. .

Abstract

During a systematic analysis of conserved gene context in prokaryotic genomes, a previously undetected, complex, partially conserved neighborhood consisting of more than 20 genes was discovered in most Archaea (with the exception of Thermoplasma acidophilum and Halobacterium NRC-1) and some bacteria, including the hyperthermophiles Thermotoga maritima and Aquifex aeolicus. The gene composition and gene order in this neighborhood vary greatly between species, but all versions have a stable, conserved core that consists of five genes. One of the core genes encodes a predicted DNA helicase, often fused to a predicted HD-superfamily hydrolase, and another encodes a RecB family exonuclease; three core genes remain uncharacterized, but one of these might encode a nuclease of a new family. Two more genes that belong to this neighborhood and are present in most of the genomes in which the neighborhood was detected encode, respectively, a predicted HD-superfamily hydrolase (possibly a nuclease) of a distinct family and a predicted, novel DNA polymerase. Another characteristic feature of this neighborhood is the expansion of a superfamily of paralogous, uncharacterized proteins, which are encoded by at least 20-30% of the genes in the neighborhood. The functional features of the proteins encoded in this neighborhood suggest that they comprise a previously undetected DNA repair system, which, to our knowledge, is the first repair system largely specific for thermophiles to be identified. This hypothetical repair system might be functionally analogous to the bacterial-eukaryotic system of translesion, mutagenic repair whose central components are DNA polymerases of the UmuC-DinB-Rad30-Rev1 superfamily, which typically are missing in thermophiles.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(Opposite) Organization of genes and potential operons in the genomic regions coding for protein components of the predicted novel DNA repair system. (A) The core (helicase-nuclease) and polymerase modules. Genes are shown not to scale; the direction of transcription is indicated by arrows. The multiple gene-by-gene alignment was produced by manually combining template-anchored genome alignments. For each column of the alignment, the corresponding COG number and predicted function is indicated. Generally, orthologous genes are shown by the same color and pattern. The exceptions are the RAMP proteins of COGs 1336, 1367, 1604, 1337 and 1332, which are all shown in pink. The remaining, more distant RAMPs (see text) are also shown in pink, with different patterns. Genes in each genome that are unique for this neighborhood are shown by white arrows; some of these unique genes belong to the following COGs: 2002, regulators of stationary/sporulation gene expression; AbrB, 1848, predicted nucleic acid-binding protein, contains PIN domain; 1458, uncharacterized protein, present only in Archaea and A.aeolicus; 0419, ATPase involved in DNA repair. Pairs of orthologous proteins that do not belong to COGs are marked by the same-colored diamonds. HTH, helix–turn–helix type transcriptional regulator; HD nuclease, HD conserved motif containing predicted nuclease conserved region; POL, novel predicted polymerase; HD_M, HD-hydrolase-domain-containing, apparently multidomain protein; Zn, Zn ribbon containing protein. The species abbreviations are as indicated in Materials and Methods. Species are color-coded as follows: Archaea, red; proteobacteria, blue; Gram-positive bacteria, green; other bacteria, black. The thermophilic species names are boxed and the optimal growth temperature (OGT) is indicated for each of them. Gene strings or individual genes shown on the figure are the following (from left to right for each genome): Aaeo, aq_387-aq_369, aq_173, aq_755; Tmar, TM1802-TM1791.1, TM1814-TM1807; Bhal, BH0327-BH0333, BH0336-BH0342; Aful, AF1869-AF1859, AF2436-AF2434, AF0072-AF0065, AF1870-AF1879; Mthe, MTH1091-MTH1078/1077, MTH328-MTH323; Mjan, MJ1234, MJ0380-MJ0386, MJ0375-MJ0379, MJ1666-MJ1665; Pfur, PF_1076764-PF_1077729-PF_1080337-PF_1081470, PF_1075331-PF_1074447-PF_1073960-PF_1072954-PF_1071624-PF_1070572-PF_1069932-PF_1067761-PF_1067252-PF_1066282-PF_1066279; Paby, PAB1064, PAB1613, PAB1685-PAB1691; Phor, PH0350, PH0921-PH0915, PH0161-PH0177, PH1252-PH1245; Tvol, TVN0114-TVN0105; Aper, APE1241-APE1228; Mtub, Rv2824c-Rv2816c; Spyo, Spy1567-Spy1561, Ecoli, YgcB-YgbT; Ssol, SSO1389-SSO1406, SSO1376-SSO1383, SSO1451-SSO1437, SSO1987-SSO2005, SSO1433-SSO1422, SSO1513-SSO1510, SSO1726-SSO1730. (B) A putative distinct bacterial operon centered on COG1518 and related to the predicted novel DNA repair system. The designations are as in (A). Gene strings: Spyo, Spy1048-Spy1046; Cjej, Cj1521c-Cj1523c; Nmen, NMA0629-NMA0631; Pmul, PM1125-PM1127, PM0311-PM0312.
Figure 1
Figure 1
(Opposite) Organization of genes and potential operons in the genomic regions coding for protein components of the predicted novel DNA repair system. (A) The core (helicase-nuclease) and polymerase modules. Genes are shown not to scale; the direction of transcription is indicated by arrows. The multiple gene-by-gene alignment was produced by manually combining template-anchored genome alignments. For each column of the alignment, the corresponding COG number and predicted function is indicated. Generally, orthologous genes are shown by the same color and pattern. The exceptions are the RAMP proteins of COGs 1336, 1367, 1604, 1337 and 1332, which are all shown in pink. The remaining, more distant RAMPs (see text) are also shown in pink, with different patterns. Genes in each genome that are unique for this neighborhood are shown by white arrows; some of these unique genes belong to the following COGs: 2002, regulators of stationary/sporulation gene expression; AbrB, 1848, predicted nucleic acid-binding protein, contains PIN domain; 1458, uncharacterized protein, present only in Archaea and A.aeolicus; 0419, ATPase involved in DNA repair. Pairs of orthologous proteins that do not belong to COGs are marked by the same-colored diamonds. HTH, helix–turn–helix type transcriptional regulator; HD nuclease, HD conserved motif containing predicted nuclease conserved region; POL, novel predicted polymerase; HD_M, HD-hydrolase-domain-containing, apparently multidomain protein; Zn, Zn ribbon containing protein. The species abbreviations are as indicated in Materials and Methods. Species are color-coded as follows: Archaea, red; proteobacteria, blue; Gram-positive bacteria, green; other bacteria, black. The thermophilic species names are boxed and the optimal growth temperature (OGT) is indicated for each of them. Gene strings or individual genes shown on the figure are the following (from left to right for each genome): Aaeo, aq_387-aq_369, aq_173, aq_755; Tmar, TM1802-TM1791.1, TM1814-TM1807; Bhal, BH0327-BH0333, BH0336-BH0342; Aful, AF1869-AF1859, AF2436-AF2434, AF0072-AF0065, AF1870-AF1879; Mthe, MTH1091-MTH1078/1077, MTH328-MTH323; Mjan, MJ1234, MJ0380-MJ0386, MJ0375-MJ0379, MJ1666-MJ1665; Pfur, PF_1076764-PF_1077729-PF_1080337-PF_1081470, PF_1075331-PF_1074447-PF_1073960-PF_1072954-PF_1071624-PF_1070572-PF_1069932-PF_1067761-PF_1067252-PF_1066282-PF_1066279; Paby, PAB1064, PAB1613, PAB1685-PAB1691; Phor, PH0350, PH0921-PH0915, PH0161-PH0177, PH1252-PH1245; Tvol, TVN0114-TVN0105; Aper, APE1241-APE1228; Mtub, Rv2824c-Rv2816c; Spyo, Spy1567-Spy1561, Ecoli, YgcB-YgbT; Ssol, SSO1389-SSO1406, SSO1376-SSO1383, SSO1451-SSO1437, SSO1987-SSO2005, SSO1433-SSO1422, SSO1513-SSO1510, SSO1726-SSO1730. (B) A putative distinct bacterial operon centered on COG1518 and related to the predicted novel DNA repair system. The designations are as in (A). Gene strings: Spyo, Spy1048-Spy1046; Cjej, Cj1521c-Cj1523c; Nmen, NMA0629-NMA0631; Pmul, PM1125-PM1127, PM0311-PM0312.
Figure 2
Figure 2
Multiple alignment of the predicted novel nuclease family (COG1518). The proteins are denoted by their systematic gene numbers, Gene Identification (GI) numbers from the GenBank database and abbreviated species names (see Materials and Methods for abbreviations). The positions of the first and the last residue of the aligned region in the corresponding protein are indicated for each sequence. The alignment coloring is based on the consensus shown underneath the alignment; b indicates a ‘big’ residue (E,K,R,I,L,M,F,Y,W), h indicates hydrophobic residues (A,C,F,I,L,M,V,W,Y), a indicates aromatic residues (F,Y,W), s indicates small residues (A,C,S,T,D,N,V,G,P), u indicates ‘tiny’ residues (G,A,S), p indicates polar residues (D,E,H,K,N,Q,R,S,T), c indicates charged residues (K,R,D,E,H), o indicates hydroxyl group containing residues (S,T), + indicates positively charged residues (R,K) and – indicates negatively charged residues (E,D). The secondary structure elements were predicted using the PHD program and a pre-constructed multiple alignment as the input and are shown above the alignment. H indicates α-helix and E indicates extended conformation (β-strand).
Figure 3
Figure 3
(Opposite and above) The predicted novel DNA polymerase. (A) Topology of the conserved core of the polymerase-cyclase palm domain. The catalytic metal-coordinating residues and the variable inserted finger module in the polymerases are indicated. (B) Multiple alignment of different polymerase and cyclase domains. The structure-based sequence alignment was constructed using the proteins whose structures have been solved (PDB nos shown in brackets) and the core secondary structure elements were derived from this structural alignment. The novel predicted polymerases were first aligned using the T_coffee program and then aligned with the rest of the sequences using secondary structure prediction as a guide. The alignment consists of the following families of (predicted) polymerases and cyclases as indicated to the right of the aligned sequences: 1, B family DNA polymerases; 2, adenylate cyclases; 3, GGDEF family of (predicted) diguanylate cyclases; 4, predicted novel DNA polymerases; 5, RNA-dependent RNA polymerases (RDRP) of positive-strand RNA viruses; 6, reverse transcriptases (RT) of retroviruses and retroid elements. The shared secondary structure elements are indicated above the alignment and the catalytic residues are shown in reverse shading. The other designations are as in Figure 2. (C) Multiple alignment of the Zn ribbons seen in the predicted DNA polymerases. Note the disruption of the Zn-chelating residues in two of the proteins. The designations are as in Figure 2. (D) Multiple alignment of a putative polymerase-thumb-like domain shared by the COG1353 proteins. The designations are as in Figure 2. (E) Multiple alignment of the permuted HD hydrolase domain present at the extreme N-terminus of several members of COG1353. The designations are as in Figure 2.
Figure 3
Figure 3
(Opposite and above) The predicted novel DNA polymerase. (A) Topology of the conserved core of the polymerase-cyclase palm domain. The catalytic metal-coordinating residues and the variable inserted finger module in the polymerases are indicated. (B) Multiple alignment of different polymerase and cyclase domains. The structure-based sequence alignment was constructed using the proteins whose structures have been solved (PDB nos shown in brackets) and the core secondary structure elements were derived from this structural alignment. The novel predicted polymerases were first aligned using the T_coffee program and then aligned with the rest of the sequences using secondary structure prediction as a guide. The alignment consists of the following families of (predicted) polymerases and cyclases as indicated to the right of the aligned sequences: 1, B family DNA polymerases; 2, adenylate cyclases; 3, GGDEF family of (predicted) diguanylate cyclases; 4, predicted novel DNA polymerases; 5, RNA-dependent RNA polymerases (RDRP) of positive-strand RNA viruses; 6, reverse transcriptases (RT) of retroviruses and retroid elements. The shared secondary structure elements are indicated above the alignment and the catalytic residues are shown in reverse shading. The other designations are as in Figure 2. (C) Multiple alignment of the Zn ribbons seen in the predicted DNA polymerases. Note the disruption of the Zn-chelating residues in two of the proteins. The designations are as in Figure 2. (D) Multiple alignment of a putative polymerase-thumb-like domain shared by the COG1353 proteins. The designations are as in Figure 2. (E) Multiple alignment of the permuted HD hydrolase domain present at the extreme N-terminus of several members of COG1353. The designations are as in Figure 2.
Figure 3
Figure 3
(Opposite and above) The predicted novel DNA polymerase. (A) Topology of the conserved core of the polymerase-cyclase palm domain. The catalytic metal-coordinating residues and the variable inserted finger module in the polymerases are indicated. (B) Multiple alignment of different polymerase and cyclase domains. The structure-based sequence alignment was constructed using the proteins whose structures have been solved (PDB nos shown in brackets) and the core secondary structure elements were derived from this structural alignment. The novel predicted polymerases were first aligned using the T_coffee program and then aligned with the rest of the sequences using secondary structure prediction as a guide. The alignment consists of the following families of (predicted) polymerases and cyclases as indicated to the right of the aligned sequences: 1, B family DNA polymerases; 2, adenylate cyclases; 3, GGDEF family of (predicted) diguanylate cyclases; 4, predicted novel DNA polymerases; 5, RNA-dependent RNA polymerases (RDRP) of positive-strand RNA viruses; 6, reverse transcriptases (RT) of retroviruses and retroid elements. The shared secondary structure elements are indicated above the alignment and the catalytic residues are shown in reverse shading. The other designations are as in Figure 2. (C) Multiple alignment of the Zn ribbons seen in the predicted DNA polymerases. Note the disruption of the Zn-chelating residues in two of the proteins. The designations are as in Figure 2. (D) Multiple alignment of a putative polymerase-thumb-like domain shared by the COG1353 proteins. The designations are as in Figure 2. (E) Multiple alignment of the permuted HD hydrolase domain present at the extreme N-terminus of several members of COG1353. The designations are as in Figure 2.
Figure 3
Figure 3
(Opposite and above) The predicted novel DNA polymerase. (A) Topology of the conserved core of the polymerase-cyclase palm domain. The catalytic metal-coordinating residues and the variable inserted finger module in the polymerases are indicated. (B) Multiple alignment of different polymerase and cyclase domains. The structure-based sequence alignment was constructed using the proteins whose structures have been solved (PDB nos shown in brackets) and the core secondary structure elements were derived from this structural alignment. The novel predicted polymerases were first aligned using the T_coffee program and then aligned with the rest of the sequences using secondary structure prediction as a guide. The alignment consists of the following families of (predicted) polymerases and cyclases as indicated to the right of the aligned sequences: 1, B family DNA polymerases; 2, adenylate cyclases; 3, GGDEF family of (predicted) diguanylate cyclases; 4, predicted novel DNA polymerases; 5, RNA-dependent RNA polymerases (RDRP) of positive-strand RNA viruses; 6, reverse transcriptases (RT) of retroviruses and retroid elements. The shared secondary structure elements are indicated above the alignment and the catalytic residues are shown in reverse shading. The other designations are as in Figure 2. (C) Multiple alignment of the Zn ribbons seen in the predicted DNA polymerases. Note the disruption of the Zn-chelating residues in two of the proteins. The designations are as in Figure 2. (D) Multiple alignment of a putative polymerase-thumb-like domain shared by the COG1353 proteins. The designations are as in Figure 2. (E) Multiple alignment of the permuted HD hydrolase domain present at the extreme N-terminus of several members of COG1353. The designations are as in Figure 2.
Figure 3
Figure 3
(Opposite and above) The predicted novel DNA polymerase. (A) Topology of the conserved core of the polymerase-cyclase palm domain. The catalytic metal-coordinating residues and the variable inserted finger module in the polymerases are indicated. (B) Multiple alignment of different polymerase and cyclase domains. The structure-based sequence alignment was constructed using the proteins whose structures have been solved (PDB nos shown in brackets) and the core secondary structure elements were derived from this structural alignment. The novel predicted polymerases were first aligned using the T_coffee program and then aligned with the rest of the sequences using secondary structure prediction as a guide. The alignment consists of the following families of (predicted) polymerases and cyclases as indicated to the right of the aligned sequences: 1, B family DNA polymerases; 2, adenylate cyclases; 3, GGDEF family of (predicted) diguanylate cyclases; 4, predicted novel DNA polymerases; 5, RNA-dependent RNA polymerases (RDRP) of positive-strand RNA viruses; 6, reverse transcriptases (RT) of retroviruses and retroid elements. The shared secondary structure elements are indicated above the alignment and the catalytic residues are shown in reverse shading. The other designations are as in Figure 2. (C) Multiple alignment of the Zn ribbons seen in the predicted DNA polymerases. Note the disruption of the Zn-chelating residues in two of the proteins. The designations are as in Figure 2. (D) Multiple alignment of a putative polymerase-thumb-like domain shared by the COG1353 proteins. The designations are as in Figure 2. (E) Multiple alignment of the permuted HD hydrolase domain present at the extreme N-terminus of several members of COG1353. The designations are as in Figure 2.
Figure 4
Figure 4
The domain architecture of the predicted novel DNA polymerases compared with domain architectures of other nucleic acid polymerases that are associated with different phosphoesterase domains. The polymerase catalytic domains are abbreviated as ‘Poly’ and each distinct family of polymerases is shown by a different shape and shade. The other domain abbreviations are: HhH, helix–hairpin–helix domain; DHH, phosphoesterase domain with DHH motif; PHP, phosphoesterase domain shared by DNA polymerases and histidinol phosphosphatase; HD, phosphoesterase domain with the HD motif; Pesterase, calcineurin-like phosphoesterase domain; Znr, zinc ribbon domain; N-OB, nucleic acid binding OB-fold domain; Nucl, 3′→5′ nuclease domain; RRML, domain with RRM-like fold; CBS, cystathionine b synthase domain; Apo1-4, Archaeal-polymerase-specific domains 1–4.
Figure 5
Figure 5
The RAMP superfamily. The top part of the figure shows a multiple alignment of the major family of the RAMP superfamily. The designations are as in Figure 2. The bottom part shows a comparison of motifs derived from multiple alignments and secondary structure prediction for five families of RAMPs. Each family was aligned individually as described in Materials and Methods (alignments are available upon request). For each family, a 85% consensus was derived and secondary structure was predicted. The conserved motifs were aligned on the basis of PSI-BLAST alignments (when available; see Results), similarity of the conserved amino acid patterns and secondary structure prediction. Color coding and secondary structure element designations are as in Figure 2.
Figure 6
Figure 6
Representation of the predicted novel repair system in different genomes. Pink rectangles, RAMP proteins; blue rectangles, other components of the system.
Figure 7
Figure 7
Phylogenetic trees for the most common components of the predicted novel repair system. (A) Putative novel nuclease (COG1518). (B) The helicase domain (COG1203). (C) The RecB family nuclease (COG1468). (D) The predicted novel polymerase (COG1353). Maximum likelihood trees constructed using the MOLPHY program are shown. Internal branches that were supported by bootstrap probability >70% are marked by black circles. In addition to the sequences from complete genomes, sequences that were identified by TBLASTN searches in the database of unfinished microbial genomes were used for phylogenetic analysis. Systematic gene names are used as branch designations except for sequences from unfinished genomes, which are designated using the corresponding species abbreviation. Archaeal genes are shown in red, genes from Gram-positive bacteria in green, proteobacterial genes in blue and genes from other bacteria in black. Genes from thermophiles are boxed.
Figure 7
Figure 7
Phylogenetic trees for the most common components of the predicted novel repair system. (A) Putative novel nuclease (COG1518). (B) The helicase domain (COG1203). (C) The RecB family nuclease (COG1468). (D) The predicted novel polymerase (COG1353). Maximum likelihood trees constructed using the MOLPHY program are shown. Internal branches that were supported by bootstrap probability >70% are marked by black circles. In addition to the sequences from complete genomes, sequences that were identified by TBLASTN searches in the database of unfinished microbial genomes were used for phylogenetic analysis. Systematic gene names are used as branch designations except for sequences from unfinished genomes, which are designated using the corresponding species abbreviation. Archaeal genes are shown in red, genes from Gram-positive bacteria in green, proteobacterial genes in blue and genes from other bacteria in black. Genes from thermophiles are boxed.
Figure 7
Figure 7
Phylogenetic trees for the most common components of the predicted novel repair system. (A) Putative novel nuclease (COG1518). (B) The helicase domain (COG1203). (C) The RecB family nuclease (COG1468). (D) The predicted novel polymerase (COG1353). Maximum likelihood trees constructed using the MOLPHY program are shown. Internal branches that were supported by bootstrap probability >70% are marked by black circles. In addition to the sequences from complete genomes, sequences that were identified by TBLASTN searches in the database of unfinished microbial genomes were used for phylogenetic analysis. Systematic gene names are used as branch designations except for sequences from unfinished genomes, which are designated using the corresponding species abbreviation. Archaeal genes are shown in red, genes from Gram-positive bacteria in green, proteobacterial genes in blue and genes from other bacteria in black. Genes from thermophiles are boxed.
Figure 7
Figure 7
Phylogenetic trees for the most common components of the predicted novel repair system. (A) Putative novel nuclease (COG1518). (B) The helicase domain (COG1203). (C) The RecB family nuclease (COG1468). (D) The predicted novel polymerase (COG1353). Maximum likelihood trees constructed using the MOLPHY program are shown. Internal branches that were supported by bootstrap probability >70% are marked by black circles. In addition to the sequences from complete genomes, sequences that were identified by TBLASTN searches in the database of unfinished microbial genomes were used for phylogenetic analysis. Systematic gene names are used as branch designations except for sequences from unfinished genomes, which are designated using the corresponding species abbreviation. Archaeal genes are shown in red, genes from Gram-positive bacteria in green, proteobacterial genes in blue and genes from other bacteria in black. Genes from thermophiles are boxed.

Similar articles

Cited by

References

    1. Stetter K.O. (1996) Hyperthermophiles in the history of life. Ciba Found Symp., 202, 1–10. - PubMed
    1. Daniel R.M. and Cowan,D.A. (2000) Biomolecular stability and life at high temperatures. Cell. Mol. Life Sci., 57, 250–264. - PMC - PubMed
    1. Nisbet E. (2000) The realms of Archaean life. Nature, 405, 625–626. - PubMed
    1. Grogan D.W. (2000) The question of DNA repair in hyperthermophilic archaea. Trends Microbiol., 8, 180–185. - PubMed
    1. Watrin L. and Prieur,D. (1996) UV and ethyl methanesulfonate effects in hyperthermophilic archaea and isolation of auxotrophic mutants of Pyrococcus strains. Curr. Microbiol., 33, 377–382. - PubMed

Publication types

MeSH terms