Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Mar 3:5:22.
doi: 10.1186/1471-2105-5-22.

SIGI: score-based identification of genomic islands

Affiliations

SIGI: score-based identification of genomic islands

Rainer Merkl. BMC Bioinformatics. .

Abstract

Background: Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands.

Results: A scoring scheme on codon frequencies Score_G1G2(cdn) = log(f_G2(cdn) / f_G1(cdn)) was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration.

Conclusions: The method reliably allows to identify genomic island and the likely origin of alien genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Selectivity of four methods for the identification of compositional atypical genes. Two sets were analysed consisting of genes VCA0010 to VCA0230 (control group) and genes VCA0271 to VCA0491 (belonging to the integron island) from chromosome two of V. cholerae. For each gene, the indicators codon usage contrast (CU), δ* difference, dicodon usage (DC) and hMPW(gene) (as introduced here) were determined as described, the values were accumulated set-wise in histograms. Any position on a curve gives on the two axes the fraction of genes below the corresponding cut-off value.
Figure 2
Figure 2
Plot of GCB-scores versus CU-contrast values for all genes of E. coli K-12 and the classification of compositional atypical genes. For all genes of the genomic data set, the two parameters were determined, converted to z-values and plotted as small dots. A high GCB-score is an indicator for adaptation to translational efficiency. Genes annotated as putatively alien according to the classification CALO and/or by using the MPW approach were labelled. The set CALO AND MPW consists of those genes identified as compositional atypical by both methods.
Figure 3
Figure 3
Summary view of SIGI's annotation for the genome of S. agalactiae. Each symbol labels a single gene (product). Meaning of the characters: "R" tRNA gene, "x" or "X" two levels of bias in putatively highly expressed genes, "I" integrase, "T" transposase, "H" hypothetical protein identified as CA, "G" a gene annotated with a function and identified as CA, "." a gene classified as insuspicious.

Similar articles

Cited by

References

    1. Lobry JR. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 1996;13:660–665. - PubMed
    1. Daubin V, Perrière G. G+C3 structuring along the genome: a common feature in Prokaryotes. Mol Biol Evol. 2003;20:471–483. doi: 10.1093/molbev/msg022. - DOI - PubMed
    1. Sharp PM, Li WH. The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987;15:1281–1295. - PMC - PubMed
    1. Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999;284:2124–2129. doi: 10.1126/science.284.5423.2124. - DOI - PubMed
    1. Lawrence JG, Ochman H. Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A. 1998;95:9413–9417. doi: 10.1073/pnas.95.16.9413. - DOI - PMC - PubMed

Publication types

Associated data

LinkOut - more resources