Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun;4(6):e000184.
doi: 10.1099/mgen.0.000184. Epub 2018 May 29.

SuperDCA for genome-wide epistasis analysis

Affiliations

SuperDCA for genome-wide epistasis analysis

Santeri Puranen et al. Microb Genom. 2018 Jun.

Abstract

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104-105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.

Keywords: epistasis; linkage disequilibrium; population genomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
log histograms of the cumulative distributions of estimated between-site couplings for Maela (left) and Massachusetts (right) populations. The thresholds indicate the learned boundary between negligible and moderate to strong couplings.
Fig. 2.
Fig. 2.
Maela population distribution of alleles at top ranked coupled SNP sites. The estimated genome-wide maximum-likelihood phylogeny is shown on the left. Each column is labelled by the genome position, gene name and a corresponding functional categorization. Columns marked by red rectangles indicate coupled sites in pbp2x and pbp2b that have a reversed minor/major allele distribution compared with the remaining displayed SNPs in the same genes.
Fig. 3.
Fig. 3.
Structural mapping of the Pbp2x (a–c) and Pbp2b (d) positions marked in Fig. 2. The panels show the transpeptidase domains of each PBP with active site residues shown in cyan and positions marked in Fig. 2 as sticks in orange or green. (a) depicts a structure-stabilizing cluster of conserved hydrophobic residues (light grey sticks) and charge interaction (dark grey) in a region proximal to (cyan cartoon) the Pbp2x active site (with bound inhibitory antibiotic as pink space-filling volume) and a mobile loop (red cartoon) covering the active site. (b) depicts the PASTA-2 domain essential for divisome complex function (green cartoon) with the bulk of the protein to the right (grey cartoon). (c) shows an overview of the Pbp2x transpeptidase domain coloured as in the detail views in (a) and (b). (d) depicts the Pbp2b transpeptidase domain region proximal to the active site with a helix (orange cartoon) mechanically connecting the active site to the 'top' of the protein. An adjacent mobile loop covering the active site is shown in red.
Fig. 4.
Fig. 4.
Overlap of estimated SNP interactions between the Maela and Massachusetts populations. Each dot represents an estimated link (interaction) between two coding sequences (CDSs), the blue CDSs are involved in antibiotic resistance, and the red CDSs are in close proximity to antibiotic resistance loci. Grey dots represent other functional categories not displayed here explicitly for visual clarity. Both axes are on a log scale and the values represent numbers of links in each CDS pair.
Fig. 5.
Fig. 5.
Seasonal variation of the allele frequencies for the two top cold-resistance couplings between glpF1-rnr and glpF1-lytC averaged over 3 years, 2007–2010. The shaded areas indicate 95 % confidence intervals.
Fig. 6.
Fig. 6.
Estimated MI for 60 749 pairs of SNPs (Maela) and 125 469 pairs of SNPs (Massachusetts).

References

    1. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. - DOI - PMC - PubMed
    1. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA. 2011;108:E1293. doi: 10.1073/pnas.1111471108. - DOI - PMC - PubMed
    1. Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol. 2014;10:e1003847. doi: 10.1371/journal.pcbi.1003847. - DOI - PMC - PubMed
    1. Morcos F, Hwa T, Onuchic JN, Weigt M. Direct coupling analysis for protein contact prediction. Methods Mol Biol. 2014;1137:55–70. doi: 10.1007/978-1-4939-0366-5_5. - DOI - PubMed
    1. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife. 2014;3:e02030. doi: 10.7554/eLife.02030. - DOI - PMC - PubMed

Publication types

LinkOut - more resources