Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003;4(9):R59.
doi: 10.1186/gb-2003-4-9-r59. Epub 2003 Aug 29.

Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach

Affiliations

Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach

Michael Strong et al. Genome Biol. 2003.

Abstract

The genome of Mycobacterium tuberculosis was analyzed using recently developed computational approaches to infer protein function and protein linkages. We evaluated and employed a method to infer genes likely to belong to the same operon, as judged by the nucleotide distance between genes in the same genomic orientation, and combined this method with those of the Rosetta Stone, Phylogenetic Profile and conserved Gene Neighbor computational methods for the inference of protein function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A simplified version of prokaryotic operon organization and functional linkages based on the Operon method. (a) Prokaryotic operon organization. Genes A, B, and C are transcribed together onto a single polycistronic transcript, which is then translated to produce three separate proteins. Proteins originating from genes of a common operon often have similar functions, interact physically through protein-protein interactions, or participate in shared biochemical pathways. (b) Functional Linkages based on the Operon method. Genes A, B and C are 'linked' if the intergenic nucleotide distance between pairs of adjacent genes is less than or equal to the specified threshold. In this case the distance between gene A and B, and the distance between gene B and C is less than the hypothetical distance threshold, thereby allowing links between all possible sets of genes.
Figure 2
Figure 2
Schematic representation of the minimum genetic requirements for adjacent genes that are transcribed independently and those transcribed together as a single operon. Cases 1, 2 and 3 depict instances where gene A and gene B are transcribed independently as distinct transcriptional units, while Case 4 depicts genes organized into a common operon. The minimum requirement for genes of a common operon is only a RBS, while Case 3 emphasizes the numerous genetic elements required if gene A and gene B are organized into separate transcription units.
Figure 3
Figure 3
Keyword recovery scores as a function of combined intergenic distances between pairs of genes in a run. All gene members of a run (bordered on each side by genes in opposite orientations) were linked and given a value equal to the combined intergenic distances between them. While the keyword recovery of genes linked by a combined intergenic distance less than 150 bp is fairly high (34-52%), it is apparent that as the total intergenic distance increases above 150 bp, there is a decrease in keyword recovery. At combined intergenic distances above 250 bp the keyword recovery is comparable to that of randomly linked genes.
Figure 4
Figure 4
Keyword recovery scores for the Operon method alone and in combination with the Rosetta Stone (RS), Phylogenetic Profile (PP), and conserved Gene Neighbor (GN) methods. Notice that the combination of either the Rosetta Stone, Phylogenetic Profiles or conserved Gene Neighbor method has a dramatic effect on the keyword recovery, with the best score resulting from a combination of the 100 bp Operon, Rosetta Stone and Phylogenetic Profile methods.
Figure 5
Figure 5
Distance profile of adjacent M. tuberculosis genes in the same orientation that are functionally linked by the Rosetta Stone, Phylogenetic Profiles or conserved Gene Neighbor methods, compared to adjacent genes in the same orientation that are not linked by these methods. (a) Distance profile of adjacent M. tuberculosis genes in the same orientation linked by either the Rosetta Stone, Phylogenetic Profile or conserved Gene Neighbor method in M. tuberculosis. (b) Distance profile of all other adjacent M. tuberculosis genes in the same orientation, excluding those linked by the Rosetta Stone, Phylogenetic Profiles or conserved Gene Neighbor methods in M. tuberculosis. (c) Distance profile of adjacent genes in the same orientation in experimentally documented operons in E. coli. E. coli operon data obtained from RegulonDB [6]. The linked profile (a) yielded a mean intergenic distance of 27 base pairs, as compared with (b) 94 base pairs for the mean intergenic distance for genes not linked by any of the three methods. This demonstrates that adjacent genes in the same orientation that have small intergenic spacing are more likely to be functionally linked that those that are separated farther apart.
Figure 6
Figure 6
Keyword recovery and maximum false positive fraction scores as the Operon distance threshold increases from 0 bp to 300 bp. Notice the decrease in the keyword recovery and the increase in maximum false positive fraction as the distance threshold increases.
Figure 7
Figure 7
Comparison of the genomic organization of the leucine biosynthesis genes in M. tuberculosis and Schizosaccharomyces pombe. (a) Genomic organization of the leuC and leuD genes of M. tuberculosis. (b) S. pombe alpha-isopropylmalate isomerase, containing both the leuC and leuD coding regions in a single fusion gene. This example illustrates the power of the Rosetta Stone, Phylogenetic Profile, Gene Neighbor and Operon methods to infer a functional linkage, in this case one that is already established [18].
Figure 8
Figure 8
Inference of M. tuberculosis protein function and operon organization based on multiple method overlap. (a) Inference of an operon encoding members involved in thiamine biosynthesis. (b) Operon inference for a region possibly involved in RNA degradation. (c) Functional links and operon inference for a region likely to be involved in cell wall metabolism. In these cases, inferences are made for the functions of uncharacterized genes by their functional linkages to genes of known function.
Figure 9
Figure 9
Identification of two novel genes linked to the arabinogalactan biosynthesis pathway, an important target of M. tuberculosis specific drugs. Based on the close proximity of adjacent genes (Operon method) and the functional linkage established by the Rosetta Stone method, we infer that Rv1503c and Rv1504c may be organized into a common operon. Both genes also have functional links to the genes rfe and rmlB, important components in the arabinogalactan biosynthesis pathway.
Figure 10
Figure 10
A unique M. tuberculosis gene linked to a glutamine synthetase paralog. Few homologs of Rv1879 exist in prokaryotes, but some plants and certain fungi contain a fusion protein containing domains homologous to both Rv1879 and to glutamine synthetase. The Operon and Rosetta Stone linkages suggest a possible role for Rv1879, and a possible functional association with the glnA3 gene product.

References

    1. Madigan M, Martinko J, Parker J. Brock Biology of Microorganisms. 9th. New Jersey: Prentice Hall; 2000.
    1. Lodish H, Baltimore D, Berk A, Zipursky SL, Matsudaira P, Darnell J. Molecular Cell Biology. 3rd. New York: Scientific American Books; 1995.
    1. Moreno-Hagelsieb G, Collado-Vides J. A powerful non-homology method for the prediction of operons in prokaryotes. Bioinformatics. 2002;18 Suppl 11:S329–S336. - PubMed
    1. Salgado H, Moreno-Haelsieb G, Smith T, Collado-Vides J. Operons in Escherichia coli : genomic analysis and predictions. Proc Natl Acad Sci USA. 2000;97:6652–6657. - PMC - PubMed
    1. Yada T, Nakao M, Totoki Y, Nakai K. Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. Bioinformatics. 1999;15:987–993. - PubMed

Publication types

MeSH terms