Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 16;13(2):e1006508.
doi: 10.1371/journal.pgen.1006508. eCollection 2017 Feb.

Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis

Affiliations

Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis

Marcin J Skwark et al. PLoS Genet. .

Abstract

Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for targeted experimental work.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Divergence between theoretical and empirical distributions of coupling strengths between sites.
Left panel (A) shows the two distributions such that the vertical axis corresponds to the log10 probability of a coupling coefficient exceeding the value of the curve on the horizontal axis. The dashed vertical line depicts the significance threshold; 5199 out of 102,551 couplings exceed the threshold. Right panel (B) displays the absolute difference between the fitted cumulative Gumbel distribution and the empirical cumulative distribution (on log10-scale) as a function of the coupling strength. The dashed vertical line marks the smallest coupling (0.129) which has a difference of more than six standard deviations among the first 50,000 empirical-Gumbel differences.
Fig 2
Fig 2. The 5199 significant couplings shown by lines connecting genomic positions which are indexed in kilobases by the running numbering.
The thickness of lines is proportional to the number of linked positions within the corresponding chromosomal elements. The red markers show the positions of sites identified in an earlier GWAS study of resistance determining variation in the pneumococcal genomes. The green markers indicate locations of protein coding sequences where significant couplings are present. Gene annotations shown outside the circle are centered at the positions of the corresponding genes.
Fig 3
Fig 3. Network of coupled protein coding sequences.
This undirected network shows all significant couplings between protein coding sequences (CDSs). Each node is a CDS, colored according to its functional annotation, and scaled according to the logarithm of the number of significant coupled loci it contained. Edges are weighted according to the logarithm of the number of significant coupled loci linking two CDSs. (A) Network component containing the genes pbp2x, pbp1a and pbp2b. (B) Network component containing the smc gene. (C) Network component containing the tRNA synthetase gene pheS and a coding sequence for another putative tRNA-binding protein. (D) Network component containing the genes for pspA and divIVA.
Fig 4
Fig 4. Distribution of couplings between sites in different PBPs.
The red markers are defined as in Fig 2.
Fig 5
Fig 5. Structural models of pbp1a, pbp2x, pbp2b with the 100 strongest couplings listed in S3 Table indicated.
The figures show the transpeptidase domains of each PBP with catalytic/active site residues shown in cyan and coupled positions as sticks with other colors. Active site bound antibiotic/inhibitor is rendered as a space-filling volume when present in the crystal structure. Panels A-D depict: pbp1a with couplings to pbp2x, green colored residues are coupled with green residues in panel B; orange colored residues in B are coupled with both green and yellow residues in A (A), pbp2x with couplings to pbp1a (B), pbp2x with couplings to pbp2b in orange (C), pbp2b with couplings to pbp2x in orange (D).
Fig 6
Fig 6. Divergence between theoretical and empirical distributions of coupling strengths between sites for S. pyogenes, defined as in Fig 1.
Left panel shows the distributions for the 324 locus data set and right panel for the 5078 locus data set.
Fig 7
Fig 7. Phylogeny of the M1 lineage and the distribution of minor/major alleles in the SNP loci involved in the 20 most highly ranked significant couplings.

Comment in

  • Epistasis Analysis Goes Genome-Wide.
    Zhang J. Zhang J. PLoS Genet. 2017 Feb 16;13(2):e1006558. doi: 10.1371/journal.pgen.1006558. eCollection 2017 Feb. PLoS Genet. 2017. PMID: 28207740 Free PMC article. No abstract available.

Similar articles

Cited by

References

    1. Castillo-Ramirez S, Corander J, Marttinen P, Aldeljawi M, Hanage WP, et al. (2012) Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus. Genome Biol 13: R126 10.1186/gb-2012-13-12-r126 - DOI - PMC - PubMed
    1. Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J, et al. (2011) Rapid pneumococcal evolution in response to clinical interventions. Science 331: 430–434. 10.1126/science.1198545 - DOI - PMC - PubMed
    1. Nasser W, Beres SB, Olsen RJ, Dean MA, Rice KA, et al. (2014) Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc Natl Acad Sci U S A 111: E1768–1776. 10.1073/pnas.1403138111 - DOI - PMC - PubMed
    1. Harris SR, Clarke IN, Seth-Smith HM, Solomon AW, Cutcliffe LT, et al. (2012) Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet 44: 413–419, S411. 10.1038/ng.2214 - DOI - PMC - PubMed
    1. He M, Miyajima F, Roberts P, Ellison L, Pickard DJ, et al. (2013) Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet 45: 109–113. 10.1038/ng.2478 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances