Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 29;24(1):214.
doi: 10.1186/s13059-023-03040-6.

happi: a hierarchical approach to pangenomics inference

Affiliations

happi: a hierarchical approach to pangenomics inference

Pauline Trinh et al. Genome Biol. .

Abstract

Recovering metagenome-assembled genomes (MAGs) from shotgun sequencing data is an increasingly common task in microbiome studies, as MAGs provide deeper insight into the functional potential of both culturable and non-culturable microorganisms. However, metagenome-assembled genomes vary in quality and may contain omissions and contamination. These errors present challenges for detecting genes and comparing gene enrichment across sample types. To address this, we propose happi, an approach to testing hypotheses about gene enrichment that accounts for genome quality. We illustrate the advantages of happi over existing approaches using published Saccharibacteria MAGs, Streptococcus thermophilus MAGs, and via simulation.

Keywords: Hypothesis testing; Metagenome-assembled genomes; Microbiome; Shotgun metagenomics; Statistical models.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
We test the null hypothesis that the probability that a gene is present are equal for tongue and plaque-associated Saccharibacteria genomes. The top 3 panels show core genes for which our proposed method resulted in greater p-values than existing methods, and the lower 3 panels show accessory genes for which our proposed method resulted in smaller p-values than existing methods. Our method reduced p-values when differences in detection cannot be attributed to genome quality factors (here, coverage), and increased p-values in situations when non-detection may be conflated with lower quality genomes. Points have been jittered vertically to separate observations
Fig. 2
Fig. 2
We investigate the performance of methods for testing for differential gene presence under simulation. (left) We find that logistic regression methods (e.g., GLM-Rao) do not control type 1 error, while happi-np controls type 1 error at nominal levels for all sample sizes. Additionally, we find that happi-a controls type 1 error for large sample sizes (n=100) and lower correlation between quality variables and the covariate of interest (σx=0.5). (right) For tests that control error rates at nominal levels, we evaluate the power of happi-np and happi-a to reject a false null hypothesis, finding that happi-a has slightly higher power than happi-np at sample size n=100. We find that power increases for all methods as sample sizes and effect sizes grow, but decreases with greater correlation between quality variables and the covariate of interest
Fig. 3
Fig. 3
We subsampled reads from a publicly available E. coli isolate genome to understand the impact of coverage on the probability of detecting a gene, finding that the probability of detection increases with coverage. We use a nonparametric smoother to interpolate this curve and use it as the true function f in our simulations

Similar articles

Cited by

References

    1. Pallen MJ, Wren BW. Bacterial pathogenomics. Nature. 2007;449(7164):835–842. doi: 10.1038/nature06248. - DOI - PubMed
    1. Rouli L, Merhej V, Fournier PE, Raoult D. The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microbes New Infect. 2015;7:72–85. doi: 10.1016/j.nmni.2015.06.005. - DOI - PMC - PubMed
    1. Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev Genet. 2020;21(4):243–254. doi: 10.1038/s41576-020-0210-7. - DOI - PMC - PubMed
    1. Imperi F, Antunes LCS, Blom J, Villa L, Iacono M, Visca P, et al. The genomics of Acinetobacter baumannii: Insights into genome plasticity, antimicrobial resistance and pathogenicity. IUBMB Life. 2011;63(12):1068–1074. doi: 10.1002/iub.531. - DOI - PubMed
    1. Van Rossum T, Ferretti P, Maistrenko OM, Bork P. Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol. 2020;18(9):491–506. doi: 10.1038/s41579-020-0368-1. - DOI - PMC - PubMed

Publication types

LinkOut - more resources