Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Oct 12:11:508.
doi: 10.1186/1471-2105-11-508.

GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing

Affiliations
Comparative Study

GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing

Gongxin Yu. BMC Bioinformatics. .

Abstract

Background: Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as GenHtr was developed for genome-wide heterogeneity analysis.

Results: For particular bacterial strains, GenHtr relies on a set of Solexa short reads on given bacteria pathogens and their isogenic reference genome to identify heterogeneity sites, the chromosomal positions with multiple variants of genes in the bacterial population, and variations that occur in large gene families. GenHtr accomplishes this by building and comparatively analyzing genome-wide heterogeneity genotypes for both the newly sequenced genomes (using massive short-read sequencing) and their isogenic reference (using simulated data). As proof of the concept, this approach was applied to SRX007711, the Solexa sequencing data for a newly sequenced Staphylococcus aureus subsp. USA300 cell line, and demonstrated that it could predict such multiple variants. They include multiple variants of genes critical in pathogenesis, e.g. genes encoding a LysR family transcriptional regulator, 23 S ribosomal RNA, and DNA mismatch repair protein MutS. The heterogeneity results in non-synonymous and nonsense mutations, leading to truncated proteins for both LysR and MutS.

Conclusion: GenHtr was developed for genome-wide heterogeneity analysis. Although it is much more time-consuming when compared to Maq, a popular tool for SNP analysis, GenHtr is able to predict potential multiple variants that pre-exist in the bacterial population as well as SNPs that occur in the highly duplicated gene families. It is expected that, with the proper experimental design, this analysis can improve our understanding of the molecular mechanism underlying the dynamics and the evolution of drug-resistant bacterial pathogens.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The four-step analysis procedure for heterogeneity analysis of bacterial population. The first step is to establish genome-wide heterogeneity phenotypes for the newly sequenced bacterial strain (a clone population) (I.). The step first creates a database of Reference Genome DNA Fragments (RGDF) with a set of non-overlapped DNA fragments from the isogenic reference genome (IRG) (I.a); Once established, RGDF is used to search the database of Solexa short reads of the bacterial strain via MegaBlast to identify its candidate trace sequences (I.b), which are then mapped to the IRG to define genetic heterogeneity (I.c). The second step is a simulation procedure to study the genome complexity of the IRG (II.). The procedure creates an IRG-specific DNA database, covering all possible N-base pair "Solexa read-like" DNA fragments from the IRG (II.a), Then the analysis follow the same procedure from I.b to I.c to identify candidate sequences from the IRG-specific DNA database (II.b) and to establish genome-wide "heterogeneity" genotypes (II.c). The genotypes from step I and II are comparatively analyzed (III.). The genetic heterogeneity sites were analyzed with genewisedb for synonymous/non-synonymous mutations (IV).

Similar articles

Cited by

References

    1. Feil EJ, Maiden MC, Achtman M, Spratt BG. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol Biol Evol. 1999;16:1496–1502. - PubMed
    1. Feil EJ, Smith JM, Enright MC, Spratt BG. Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics. 2000;154:1439–1450. - PMC - PubMed
    1. Lawrence JG, Hendrickson H. Lateral gene transfer: when will adolescence end? Mol Microbiol. 2003;50:739–749. doi: 10.1046/j.1365-2958.2003.03778.x. - DOI - PubMed
    1. Maynard Smith J, Smith NH. Detecting recombination from gene trees. Mol Biol Evol. 1998;15:590–599. - PubMed
    1. Jarzembowski T, Wisniewska K, Jozwik A, Witkowski J. Heterogeneity of methicillin-resistant Staphylococcus aureus strains (MRSA) characterized by flow cytometry. Curr Microbiol. 2009;59:78–80. doi: 10.1007/s00284-009-9395-x. - DOI - PubMed

Publication types