Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 11:7:283.
doi: 10.3389/fmicb.2016.00283. eCollection 2016.

DistAMo: A Web-Based Tool to Characterize DNA-Motif Distribution on Bacterial Chromosomes

Affiliations

DistAMo: A Web-Based Tool to Characterize DNA-Motif Distribution on Bacterial Chromosomes

Patrick Sobetzko et al. Front Microbiol. .

Abstract

Short DNA motifs are involved in a multitude of functions such as for example chromosome segregation, DNA replication or mismatch repair. Distribution of such motifs is often not random and the specific chromosomal pattern relates to the respective motif function. Computational approaches which quantitatively assess such chromosomal motif patterns are necessary. Here we present a new computer tool DistAMo (Distribution Analysis of DNA Motifs). The algorithm uses codon redundancy to calculate the relative abundance of short DNA motifs from single genes to entire chromosomes. Comparative genomics analyses of the GATC-motif distribution in γ-proteobacterial genomes using DistAMo revealed that (i) genes beside the replication origin are enriched in GATCs, (ii) genome-wide GATC distribution follows a distinct pattern, and (iii) genes involved in DNA replication and repair are enriched in GATCs. These features are specific for bacterial chromosomes encoding a Dam methyltransferase. The new software is available as a stand-alone or as an easy-to-use web-based server version at http://www.computational.bio.uni-giessen.de/distamo.

Keywords: DNA replication; Escherichia coli; algorithm; bacteria; bioinformatics; chromosome maintenance; computational biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme of the motif analysis algorithm DistAMo. (A) The amino acid sequences encoded by a motif-containing DNA sequence (pot motifs) are determined. A probability of each potential motif to be encoded by a motif containing sequence is calculated. (B) The positions of potential motifs are detected using a suffix tree search in the proteome and assigned to the corresponding genes, gene groups or chromosomal region depending on the type of analysis. The random distribution of the number of motifs follows a Poisson binomial distribution. The z-score (significance value) is determined from the actual number of motifs, the mean (the expected number of motifs) and the standard deviation of the Poisson binomial distribution.
Figure 2
Figure 2
Impact of motif and potential motif frequencies on the z-scores of other motifs. The abscissa and ordinate show the number of potential GATC sites and real GATC sites respectively in an otherwise randomized 3000 bp coding sequence. The z-score for the tetramer is indicated in rainbow colors with red for a z-score ≥ 2 and blue for a z-score ≤ −2. (A) GATC z-scores. (B) AGAT z-scores for different enrichments of GATC (see axis). AGAT overlaps with GATC. An increase of GATC therefore increases the frequency of AGAT (C) AATC z-scores for different enrichments of GATC (see axis). AATC competes with GATC sites due to the sharing of potential motif sites. Therefore, an increase of GATC decreases the abundance of AATC.
Figure 3
Figure 3
Impact of motif enrichment on the z-scores of other motifs. Depicted is the frequency distribution of z-score pairs, consisting of z-score of the enriched motif and the z-score of another motif in a random protein sequence. The distribution shows no dependence of an enrichment of tetramers on the z-score of other tetramers. In genes with random sequence approximately 1 out of 1000 genes show a significant enrichment for a tetramer if another tetramer was enriched significantly. This is equal to the frequency of a significant enrichment of a tetramer in a random sequence (P (A|B) = P (A)). Hence, within the limits of motif distributions present in bacteria no interference of tetramer z-scores are to be expected using DistAMo.
Figure 4
Figure 4
DistAMo online version. The tool is available online with a user-friendly interface to allow access also to non-experts. (A) Input mask where the user can choose from thousands of complete genomes and search for the motif of interest. Help symbols guide through the input. (B) After computation the user is informed via email and guided to the results. It holds global information including strand bias and ori-ter bias significance scores of the motif. (C) A click on each figure opens a page with detailed information about the genes with significant over- and under-representation. Gene positions can be displayed on the chromosome plot.
Figure 5
Figure 5
Distribution of KOPS sites on the leading strand (A) and lagging strand (B) of the E. coli chromosome. A significant over- or under-representation is color-coded by a red or blue color respectively. Rings from outside to the inside differ in the size of the sliding window from 50 to 500 kb in 50 kb steps.
Figure 6
Figure 6
GATC-distribution analysis with DistAMo. (A) Rings depict the standard DistAMo output with different sliding window sizes (compare Figure 5) for the distribution of GATC sites in E. coli. Raw data are provided in Table S2. (B) Average z-scores of GATC densities on chromosomes of Dam positive γ-proteobacteria. Error bar (SEM) are indicated by the area around the curve. The origin of replication is situated in the 0 and 1000 position (circular chromosome). The z-score data of different-sized chromosomes were scaled to 1000. (C) Analysis as in (B) with Dam negative γ-proteobacteria. (D) oriC-proximal genes in E. coli. Overrepresentation of GATC is indicated by a red color. The set of γ-proteobacteria was split in Dam-positive and Dam-negative species and respective z-scores of oriC-proximal genes plotted (E,F). Used species are listed in Table S1. GATC densities for oriC-proximal regions of all analyzed bacteria is shown in Figure S3.
Figure 7
Figure 7
Distribution of GATC in genes with specific functions. (A) Analysis of GATC over-representation in E. coli for COG groups. (B) Calculation of over-representation for all tetramers in the most GATC over-represented COG group (Replication and repair).

Similar articles

Cited by

References

    1. Adams D. W., Wu L. J., Errington J. (2014). Cell cycle regulation by the bacterial nucleoid. Curr. Opin. Microbiol. 22, 94–101. 10.1016/j.mib.2014.09.020 - DOI - PMC - PubMed
    1. Annaluru N., Muller H., Mitchell L. A., Ramalingam S., Stracquadanio G., Richardson S. M., et al. . (2014). Total synthesis of a functional designer eukaryotic chromosome. Science 344, 55–58. 10.1126/science.1249252 - DOI - PMC - PubMed
    1. Bigot S., Saleh O. A., Lesterlin C., Pages C., El Karoui M., Dennis C., et al. . (2005). KOPS: DNA motifs that control E. coli chromosome segregation by orienting the FtsK translocase. EMBO J. 24, 3770–3780. 10.1038/sj.emboj.7600835 - DOI - PMC - PubMed
    1. Blyn L. B., Braaten B. A., Low D. A. (1990). Regulation of pap pilin phase variation by a mechanism involving differential dam methylation states. EMBO J. 9, 4045–4054. - PMC - PubMed
    1. Brézellec P., Hoebeke M., Hiet M. S., Pasek S., Ferat J. L. (2006). DomainSieve: a protein domain-based screen that led to the identification of dam-associated genes with potential link to DNA maintenance. Bioinformatics 22, 1935–1941. 10.1093/bioinformatics/btl336 - DOI - PubMed

LinkOut - more resources