Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 Dec 8;95(25):14658-63.
doi: 10.1073/pnas.95.25.14658.

Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements

Affiliations

Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements

S A Teichmann et al. Proc Natl Acad Sci U S A. .

Abstract

The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs PSI-BLAST and GEANFAMMER to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence identity histogram of all protein pairs in the MG genome that match at an E value of ≤0.01 by using fasta ktup = 1. Self matches are excluded. Most pairs of proteins match each other in the region below 30%, which is the region where the ability to detect relationships by pairwise sequence comparison programs drops off rapidly.
Figure 2
Figure 2
Histogram of lengths of all protein sequences in MG. Sequences are placed in bins with steps of 50 residues. The lengths of genes that match PDBD sequences are superimposed in solid black. The distribution of matched genes is approximately the same as that of the whole genome.

References

    1. Murzin A G, Brenner S E, Hubbard T, Chothia C. J Mol Biol. 1995;247:536–540. - PubMed
    1. Orengo C A, Michie A D, Jones S, Jones D T, Swindells M B, Thornton J M. Structure (London) 1997;5:1093–1108. - PubMed
    1. Ingram V. Nature (London) 1961;189:704–708. - PubMed
    1. Rossmann M G, Moras D, Olsen K W. Nature (London) 1974;250:194–199. - PubMed
    1. Patthy L. Curr Opin Struct Biol. 1991;1:351–361.

Publication types

Substances