Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001:1:2.
doi: 10.1186/1471-2180-1-2. Epub 2001 Mar 30.

A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis

Affiliations

A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis

P Le Flèche et al. BMC Microbiol. 2001.

Abstract

Background: Some pathogenic bacteria are genetically very homogeneous, making strain discrimination difficult. In the last few years, tandem repeats have been increasingly recognized as markers of choice for genotyping a number of pathogens. The rapid evolution of these structures appears to contribute to the phenotypic flexibility of pathogens. The availability of whole-genome sequences has opened the way to the systematic evaluation of tandem repeats diversity and application to epidemiological studies.

Results: This report presents a database (http://minisatellites.u-psud.fr) of tandem repeats from publicly available bacterial genomes which facilitates the identification and selection of tandem repeats. We illustrate the use of this database by the characterization of minisatellites from two important human pathogens, Yersinia pestis and Bacillus anthracis. In order to avoid simple sequence contingency loci which may be of limited value as epidemiological markers, and to provide genotyping tools amenable to ordinary agarose gel electrophoresis, only tandem repeats with repeat units at least 9 bp long were evaluated. Yersinia pestis contains 64 such minisatellites in which the unit is repeated at least 7 times. An additional collection of 12 loci with at least 6 units, and a high internal conservation were also evaluated. Forty-nine are polymorphic among five Yersinia strains (twenty-five among three Y. pestis strains). Bacillus anthracis contains 30 comparable structures in which the unit is repeated at least 10 times. Half of these tandem repeats show polymorphism among the strains tested.

Conclusions: Analysis of the currently available bacterial genome sequences classifies Bacillus anthracis and Yersinia pestis as having an average (approximately 30 per Mb) density of tandem repeat arrays longer than 100 bp when compared to the other bacterial genomes analysed to date. In both cases, testing a fraction of these sequences for polymorphism was sufficient to quickly develop a set of more than fifteen informative markers, some of which show a very high degree of polymorphism. In one instance, the polymorphism information content index reaches 0.82 with allele length covering a wide size range (600-1950 bp), and nine alleles resolved in the small number of independent Bacillus anthracis strains typed here.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Querying the tandem repeats database 1A: bacterial tandem repeats main page Bacteria species are listed in alphabetical order. The name of the strain used for sequencing is indicated after the species name and before the genome size (expressed in megabase). The rightmost figure indicates the density (per Mb) of tandem repeat arrays longer than 100 bp. The search for tandem repeats can be restricted according to a combination of criteria, including total array length (L), repeat unit length (U), number of repeats (N), internal conservation of the repeats (V), position (expressed in kilobase) on the genome (Pos), GC content of the array (%GC), strand bias (B). Three different biases can be evaluated, GC bias, AT bias and Purine-Pyrimidine bias. The bias reflects strand asymmetry of the repeat sequence. The search output can either present a list of characteristics of the tandem repeats fulfilling criteria, ordered according to their position on the genome, or classify the tandem repeats according to a selected structural parameter. 1B: examples of queries in three genomes All tandem repeat arrays spanning more than 100 base-pairs are classified according to repeat unit length. The query was run on Buchnera sp. (left panel), Yersinia pestis (middle panel) and Pseudomonas aeruginosa (right panel).
Figure 2
Figure 2
Relative frequency of tandem repeats within bacterial genomes The ten non-pathogen species are listed on top. Within each category, species are ordered according to genome size (smallest genome on top). The density of tandem repeat arrays longer than 100 bp is plotted for each species (dark bars). The clear bars reflect the excess (χ2 values) of tandem repeats with a repeat unit length multiple of three.
Figure 3
Figure 3
Selection procedure of minisatellites for Y. pestis 3A: Sixty-four tandem repeats have at least 7 units longer than 9 base-pairs. Panel A presents the distribution of these 64 loci according to repeat unit length. Each rectangle is an hyperlink to an alignment file. The rectangle indicated by the arrow is linked to the file illustrated in panel B. 3B: This is an annotated alignment file. The file corresponds to Yp3057ms09 (Table 1 and Figure 4; Yp : Yersinia pestis; 3057 : position on the genome, expressed in kilobases; MS09 : MiniSatellite index). The consensus pattern of 18 base-pairs is aligned to each motif. Annotations of the file are inserted within brackets. Although this minisatellite is very polymorphic, eleven different motifs (labeled a-k) are observed in the sequenced allele. The first four and last two copies are most diverged and rare. Four types of motifs (f, g, h, i) constitute most of the array. For convenience, 18 motifs have been removed from the alignment file and replaced by their letter code. The last two copies are 21 base-pair long instead of 18. The end of the alignment file (panel B, bottom) provides sequence data flanking the tandem repeat array. The positions of the primers chosen for PCR amplification of this locus (Table 1) are shown underlined.
Figure 4
Figure 4
Images of PCR amplification of the twenty-five minisatellites polymorphic in the Y. pestis strains DNA from three reference Y. pestis strains representing each of the main biovars, antiqua (lane 1), medievalis (lane 2) and orientalis (lane 3) and two Y. pseudotuberculosis strains (lanes 4 and 5) have been PCR amplified and an aliquot of the products has been run on 2% horizontal agarose gels as described. The length of the minisatellite motifs (U) and the size range is indicated on each panel. Yp2916ms07 has one of the shortest (10 bp) unit. Four alleles are clearly distinguished between the 150 and 200 bp marker fragments.
Figure 5
Figure 5
PCR amplification of B. anthracis minisatellite CEB-Bams30 DNA from B. anthracis and B. cereus (six rightmost lanes) was amplified using primers for CEB-Bams30 (Table 2). The PCR products were run on a 40 cm long 2% ordinary agarose gel.
Figure 6
Figure 6
Bacillus anthracis phylogenetic tree The genotype of each strain for the polymorphic minisatellites is given (size estimates for each allele are given in Table 3). "0" indicates a failure of the PCR amplification. This is most often associated with B. cereus strains, and probably reflects in these cases sequence divergence in the flanking sequence. The phylogenetic tree was produced using the Neighbor-Joining method as available on-line at
Figure 7
Figure 7
Significant correlation between number of alleles and minisatellites structural characteristics The number of alleles is plotted as a function of Total length and %GC for Bacillus anthracis, and %matches for Yersinia pestis (the correlations are highly significant at the 0.01 level). Number of alleles for each locus is the total number detected (i.e. Bacillus anthracis and B. cereus; Yersinia pestis and Y. pseudotuberculosis).

References

    1. van Belkum A, Scherer S, van Leeuwen W, Willemse D, van Alphen L, Verbrugh H. Variable number of tandem repeats in clinical strains of Haemophilus influenzae. Infect Immun. 1997;65:5017–27. - PMC - PubMed
    1. Keim P, Price LB, Klevytska AM, Smith KL, Schupp JM, Okinaka R, Jackson PJ, Hugh-Jones ME. Multiple-Locus Variable-Number Tandem Repeat Analysis Reveals Genetic Relationships within Bacillus anthracis. J Bacteriol. 2000;182:2928–2936. - PMC - PubMed
    1. Frothingham R, Meeker-O'Connell WA. Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats. Microbiology. 1998;144:1189–96. - PubMed
    1. Supply P, Mazars E, Lesjean S, Vincent V, Gicquel B, Locht C. Variable human minisatellite-like regions in the Mycobacterium tuberculosis genome. Mol Microbiol. 2000;36:762–71. - PubMed
    1. Adair DM, Worsham PL, Hill KK, Klevytska AM, Jackson PJ, Friedlander AM, Keim P. Diversity in a variable-number tandem repeat from Yersinia pestis. J Clin Microbiol. 2000;38:1516–9. - PMC - PubMed

Publication types

LinkOut - more resources