Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Apr 13:12:187.
doi: 10.1186/1471-2164-12-187.

Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae using a modification of the finite supragenome model

Affiliations
Comparative Study

Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae using a modification of the finite supragenome model

Robert Boissy et al. BMC Genomics. .

Abstract

Background: Staphylococcus aureus is associated with a spectrum of symbiotic relationships with its human host from carriage to sepsis and is frequently associated with nosocomial and community-acquired infections, thus the differential gene content among strains is of interest.

Results: We sequenced three clinical strains and combined these data with 13 publically available human isolates and one bovine strain for comparative genomic analyses. All genomes were annotated using RAST, and then their gene similarities and differences were delineated. Gene clustering yielded 3,155 orthologous gene clusters, of which 2,266 were core, 755 were distributed, and 134 were unique. Individual genomes contained between 2,524 and 2,648 genes. Gene-content comparisons among all possible S. aureus strain pairs (n = 136) revealed a mean difference of 296 genes and a maximum difference of 476 genes. We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae. There was excellent agreement between RAST's annotations and our CDS clustering procedure providing for high fidelity metabolomic subsystem analyses to extend our comparative genomic characterization of these strains.

Conclusions: Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens. In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Clustering of strains using neighbor-grouping analysis. The figure shows the relationships among the 17 Staphylococcus aureus genomes under study based on the percentage of shared distributed gene. Valid neighbor groups of genomes (see Materials and Methods) are enclosed in ellipses.
Figure 2
Figure 2
Pair-wise gene possession comparisons among all 136 possible Staphylococcus aureus strains pairs. The comparison of two strains is summarized in the (4-level) box at the intersection of the row and column corresponding to the respective strains. Pair-wise relationships are summarized based on the number of genes with orthologs in each of the two strains (S = similarity score, level 1 of each box); the number of genes with an ortholog in one strain but not the other (D = difference score, level 2 of each box); a composite comparison score (C = S - D, level 3 of each box); and the number of genes with orthologs found only in both strains (P = pair unique score, level 4 of each box).
Figure 3
Figure 3
Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the |S| strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
Figure 4
Figure 4
Histogram of observed sample gene frequencies compared to the predicted number using the finite supragenome model. The number of genes for each frequency class was calculated using the results from our revised finite supragenome model (trained on all 17 strains). The observed and predicted number of core genes (2,266) found in all 17 strains agreed exactly; these values are not shown to avoid distortion of the scale of the graph. Distributed genes appear in two or more strains, but not all (from 2 to 16 here).
Figure 5
Figure 5
Comparison of the observed and predicted supragenome parameters as additional strains are sequenced. The two panels on the left show observed (upper panel) and predicted (lower panel) numbers of new genes that were or would be found in the second to the nth genome for the number of strains examined (17) or a hypothetical study of 30 strains of Staphylococcus aureus. The two panels on the right show observed (upper panel) and predicted (lower panel) numbers of core and total genes that were or are predicted to be found in second to the nth genome for the number of strains examined (17) or a hypothetical study of 30 strains of Staphylococcus aureus. Observed new, core, and total genes were calculated using genomes examined in ascending order of their counts of non-core genes.

Similar articles

Cited by

References

    1. CDC (2002) Staphylococcus aureus resistant to vancomycin--United States, 2002. MMWR. 2002;51:565–567. - PubMed
    1. Kluytmans J, van Belkum A, Verbrugh H. Nasal carriage of Staphylococcus aureus: epidemiology, underlying mechanisms, and associated risks. Clin Microbial Rev. 1997;10:505–520. - PMC - PubMed
    1. Coates T, Bax R, Coates A. Nasal decolonization of Staphylococcus aureus with mupirocin: strengths, weaknesses and future prospects. J Antimicrob Chemother. 2009;64:9–15. doi: 10.1093/jac/dkp159. - DOI - PMC - PubMed
    1. Daum RS. Clinical practice. Skin and soft-tissue infections caused by methicillin-resistant Staphylococcus aureus. N Engl J Med. 2007;357:380–390. doi: 10.1056/NEJMcp070747. - DOI - PubMed
    1. Stryjewski ME, Chambers HF. Skin and soft-tissue infections caused by community-acquired methicillin-resistant Staphylococcus aureus. Clin Infect Dis. 2008;46(Suppl 5):S368–377. - PubMed

Publication types

MeSH terms

LinkOut - more resources