Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 4;3(1):vbad142.
doi: 10.1093/bioadv/vbad142. eCollection 2023.

G2GSnake: a Snakemake workflow for host-pathogen genomic association studies

Affiliations

G2GSnake: a Snakemake workflow for host-pathogen genomic association studies

Zhi Ming Xu et al. Bioinform Adv. .

Abstract

Summary: Joint analyses of paired host and pathogen genome sequences have the potential to enhance our understanding of host-pathogen interactions. A systematic approach to conduct such a joint analysis is through a "genome-to-genome" (G2G) association study, which involves testing for associations between all host and pathogen genetic variants. Significant associations reveal host genetic factors that might drive pathogen variation, highlighting biological mechanisms likely to be involved in host control and pathogen escape. Here, we present a Snakemake workflow that allows researchers to conduct G2G studies in a reproducible and scalable manner. In addition, we have developed an intuitive R Shiny application that generates custom summaries of the results, enabling users to derive relevant insights.

Availability and implementation: G2GSnake is freely available at: https://github.com/zmx21/G2GSnake under the MIT license.

PubMed Disclaimer

Conflict of interest statement

O.N. is now an employee of SUN bioscience SA.

Figures

Figure 1.
Figure 1.
Summary of G2Gsnake. The workflow relies on three main input sources: (i) pathogen genetic data, (ii) host genetic data, and (iii) a sample mapping file along with additional covariates. Associations between all pathogen and host genetic variants are then tested under the genome-to-genome (G2G) framework. Pathogen and host genetic principal components are included as covariates to correct for stratification. Finally, results can be visualized in the R Shiny app. Created with BioRender.com.
Figure 2.
Figure 2.
Performance of G2GSnake based on simulation studies. (A) Precision–recall curve when the REGENIE approach was applied to simulation C (no stratified-associated variants) and simulation D (with stratified-associated variants), illustrating the precision–recall trade-off of various P-value thresholds. The threshold based on Bonferroni correction (1e−10) is labeled. (B) Association P-values based on simulation D for host–pathogen variant pairs that are either: (i) associated, (ii) associated and stratified, or (iii) not associated. The number of variants that belong to each category are indicated in brackets. The dashed line indicates P-value threshold based on Bonferroni correction (1e−10).

Similar articles

References

    1. Aksamentov I, Roemer C, Hodcroft E. et al. Nextclade: clade assignment, mutation calling and quality control for viral genomes. JOSS 2021;6:3773. 10.21105/joss.03773 - DOI
    1. Ansari MA, Pedergnana V, L C Ip C. et al.; STOP-HCV Consortium. Genome-to-genome analysis highlights the effect of the human innate and adaptive immune systems on the hepatitis C virus. Nat Genet 2017;49:666–73. 10.1038/ng.3835 - DOI - PMC - PubMed
    1. Band G, Leffler EM, Jallow M. et al. Malaria protection due to sickle haemoglobin depends on parasite genotype. Nature 2022;602:106–11. 10.1038/s41586-021-04288-3 - DOI - PMC - PubMed
    1. Bartha I, Carlson JM, Brumme CJ. et al. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. eLife 2013;2:e01123. 10.7554/eLife.01123 - DOI - PMC - PubMed
    1. Chang CC, Chow CC, Tellier LC. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:7. 10.1186/s13742-015-0047-8 - DOI - PMC - PubMed