Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 May;12(5):689-700.
doi: 10.1101/gr.219302.

A complete sequence of the T. tengcongensis genome

Affiliations

A complete sequence of the T. tengcongensis genome

Qiyu Bao et al. Genome Res. 2002 May.

Abstract

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4(T) (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the rest, 824 CDS (31.8%), are functionally unknown. One of the interesting features of the T. tengcongensis genome is that 86.7% of its genes are encoded on the leading strand of DNA replication. Based on protein sequence similarity, the T. tengcongensis genome is most similar to that of Bacillus halodurans, a mesophilic eubacterium, among all fully sequenced prokaryotic genomes up to date. Computational analysis on genes involved in basic metabolic pathways supports the experimental discovery that T. tengcongensis metabolizes sugars as principal energy and carbon source and utilizes thiosulfate and element sulfur, but not sulfate, as electron acceptors. T. tengcongensis, as a gram-negative rod by empirical definitions (such as staining), shares many genes that are characteristics of gram-positive bacteria whereas it is missing molecular components unique to gram-negative bacteria. A strong correlation between the G + C content of tDNA and rDNA genes and the optimal growth temperature is found among the sequenced thermophiles. It is concluded that thermophiles are a biologically and phylogenetically divergent group of prokaryotes that have converged to sustain extreme environmental conditions over evolutionary timescale.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Circular representation of the Thermoanaerobacter tengcongensis genome. Circles display (from the outside): (1) Physical map scaled in megabases from base 1, the start of the putative replication origin. (2) Coding sequences transcribed in the clockwise direction. (3) Coding sequences transcribed in the counterclockwise direction. (4) G + C percent content (in a 10-kb window and 1-kb incremental shift); values >37.6% (average) are in red and smaller in blue. (5) GC skew (G-C/G + C, in a 10-kb window and 1-kb incremental shift); values greater than zero are in magenta and smaller in green. (6) Repeated sequences; short 30-bp repeats are in red and other types in blue. (7) tRNA genes. (8) rRNA genes. Genes displayed in 2 and 3 are color-coded according to different functional categories: translation/ribosome structure/biogenesis, pink; transcription, olive drab; DNA replication/recombination/repair, forest green; cell division/chromosome partitioning, light blue; posttranslational modification/protein turnover/chaperones, purple; cell envelope biogenesis/outer membrane, red; cell motility/secretion, plum; inorganic ion transport/metabolism, dark sea green; signal transduction mechanisms, medium purple; energy production/conversion, dark olive green; carbohydrate transport/metabolism, gold; amino acid transport/metabolism, yellow; nucleotide transport/metabolism, orange; coenzyme metabolism, tan; lipid metabolism, salmon; secondary metabolites biosynthesis/transport/catabolism, light green; general function prediction only, dark blue; conserved hypothetical, medium blue; hypothetical, black; unclassified, light blue; pseudogenes, gray. (b) Linear representation of the T. tengcongensis genome. Genes are color-coded according to different functional categories as described above for a , with above character-string representing gene names or IDs. Arrows indicate the direction of transcription. Genes with authentic frameshift and point mutations are indicated with X. Paralogous gene families are indicated by family ID in a box above the predicted genes. Numbers next to GES (Goldman-Engleman-Steitz) represent the number of membrane-spanning domains predicted by Goldman-Engleman-Steitz scale calculated by TMHMM. Proteins with five or more GES are indicated. The 305 copies of the 30-bp short repeat, clustered in two regions, are indicated with the greater-than symbol. RNA genes, including those of rRNA, tRNA, and other RNA genes, signal peptides and long repeats are also indicated. Numbers on the tRNA symbols represent the number of tRNAs in the cluster.
Figure 2
Figure 2
The replication origin of the Thermoanaerobacter tengcongensis. GC skew [(G-C)/(G + C)] was calculated with a nonoverlapping sliding window of 10 kb for a single strand over the length (upper horizontal line). Cumulative GC skew was plotted from position 1 of the genome (upper solid line). Cumulative gene direction (upper dotted line) was plotted from position 1 of the genome sequence, showing that the majority of genes transcribe along the same direction following the replication forks. In the skewed oligomer (TTTTTCTT)1423 part (lower), vertical lines above the center represent the location of this octamer on one DNA strand, and lines below the center indicate the positions on the complementary strand. The transition in GC and oligomer skews, maxima of the curves at the middle of the genome sequence, is identified as the putative terminus of replication.
Figure 3
Figure 3
Relative distance of the Thermoanaerobacter tengcongensis genome with those of other 47 completely sequenced genomes, measured by a collective similarity score of the 2588 predicted coding sequences (CDS). All the sequences were retrieved from NCBI databases. A tally was kept of which genome produces the significant similarity with the BLASTP program above an expected value of 1e-10. The number of T. tengcongensis CDS matched to those of each genome is tabulated. Bacillus halodurans has the highest value of 54.4%, indicating its highest similarity to T. tengcongensis.
Figure 4
Figure 4
Correlation of G + C contents and optimum growth temperatures (OGT) of thermophilic bacteria. G + C contents of genomes (solid squares), rDNAs (solid circles), and tDNAs (solid triangles) of 12 thermophilic archaea and eubacteria are plotted against the corresponding OGT. G + C contents of tDNAs and rDNAs show significant correlation with OGTs (linear regression coefficients R = 0.9 and R = 0.92, respectively), but no significant correlation is observed between genomic G + C contents and OGT (R = 0.09).
Figure 5
Figure 5
Correlation of G + C contents between the genome average and rDNA/tDNA clusters from 36 mesophiles. G + C contents of tDNA and rDNA (underlined) show significant correlation with genome G + C contents (linear regression coefficients R = 0.88 and R = 0.8, respectively). Numbers in the figure stand for the sequenced prokaryotes: 1, Uure; 2, Buch; 3, Mpul; 4, Bbur; 5, Rpxx; 6, Cjej; 7, Cace; 8, Mgen; 9, SaurN; 10, Llact; 11, Hinf; 12, Spyo; 13, Hpyl; 14, Spneu; 15, Mpneu; 16, Pmul; 17, Cpneu; 18, Ctra; 19, Bsub; 20, Bhal; 21, Vcho; 22, Synecho; 23, Ecoli_O157; 24, Ecoli; 25, Nmen; 26, Xfas; 27, Tpal; 28, Mlep; 29, Atum; 30, Smel; 31, Mlot; 32, Mtub; 33, Paer; 34, Drad; 35, Ccre; and 36, Hbsp.

References

    1. Alexander K, Volini M. Properties of an Escherichia coli rhodanese. J Biol Chem. 1987;262:6595–6604. - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001;29:37–40. - PMC - PubMed
    1. Armitage JP. Bacterial tactic responses. Adv Microb Physiol. 1999;41:229–289. - PubMed
    1. Berg BL, Stewart V. Structural genes for nitrate-inducible formate dehydrogenase in Escherichia coli K-12. Genetics. 1990;125:691–702. - PMC - PubMed

Publication types

MeSH terms

Associated data

LinkOut - more resources