Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 24;4(1):vbae030.
doi: 10.1093/bioadv/vbae030. eCollection 2024.

MetaQuad: shared informative variants discovery in metagenomic samples

Affiliations

MetaQuad: shared informative variants discovery in metagenomic samples

Sheng Xu et al. Bioinform Adv. .

Abstract

Motivation: Strain-level analysis of metagenomic data has garnered significant interest in recent years. Microbial single nucleotide polymorphisms (SNPs) are genomic variants that can reflect strain-level differences within a microbial species. The diversity and emergence of SNPs in microbial genomes may reveal evolutionary history and environmental adaptation in microbial populations. However, efficient discovery of shared polymorphic variants in a large collection metagenomic samples remains a computational challenge.

Results: MetaQuad utilizes a density-based clustering technique to effectively distinguish between shared variants and non-polymorphic sites using shotgun metagenomic data. Empirical comparisons with other state-of-the-art methods show that MetaQuad significantly reduces the number of false positive SNPs without greatly affecting the true positive rate. We used MetaQuad to identify antibiotic-associated variants in patients who underwent Helicobacter pylori eradication therapy. MetaQuad detected 7591 variants across 529 antibiotic resistance genes. The nucleotide diversity of some genes is increased 6 weeks after antibiotic treatment, potentially indicating the role of these genes in specific antibiotic treatments.

Availability and implementation: MetaQuad is an open-source Python package available via https://github.com/holab-hku/MetaQuad.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overview of analysis pipeline and example of shared variants. (a) Recommended analysis pipeline for MetaQuad. The Shotgun metagenomic sequencing files are collected after removing contaminations (e.g. host contaminants), low-quality and duplicated reads. Filtered reads are aligned to a reference database using alignment tools, and low mapping quality reads are further filtered out. Cellsnp-lite is used to count the alleles of each sample, and the output data are processed by MetaQuad. All variants are listed in a CSV file with number of clusters, which distinguishes informative variants. Informative variants can be utilized to study the nucleotide diversity of each gene or genome. (b) Example of informative variants. Informative variants can be found in multiple samples with consistent changes in allele frequencies across populations. The allele frequencies of informative variants are similar within each population. (c) Example of random variants (background noise). Random variants can be found in one or more samples, but their allele frequencies do not have a consistent change, and the frequencies can vary greatly between samples. In the figure, different colors represent different allele frequencies.
Figure 2.
Figure 2.
Shared informative variants in real and simulated datasets. (a) Allele frequencies of informative variants detected by MetaQuad within each individual. Colors indicate varying allele frequencies. (b) Allele frequencies of uninformative variants within each individual. Colors denote distinct allele frequencies. (c) PCoA plot of informative variants detected by MetaQuad for each individual. (d) Schematic representation of the strain simulation pipeline for the simulated dataset. (e) Allele frequencies of simulated informative variants.
Figure 3.
Figure 3.
Comparison of variant calling tools. (a) F1 score and true positive rate of all variant calling tools, with a minimum sample threshold (min_sample) of 2. (b) False positive variants reported by all tools, with a minimum sample threshold (min_sample) of 2. (c) F1 score and true positive rate of all variant calling tools, with a minimum sample threshold (min_sample) of 5. (d) False positive variants reported by all tools, with a minimum sample threshold (min_sample) of 5.
Figure 4.
Figure 4.
The impact of antibiotics on the human gut microbiome through antibiotic-associated variants. (a) The pipeline used for detecting shared informative variants in the ARG dataset. (b) Average mean depths of each ARG. (c) The time usage of all variant calling tools in the ARG dataset, presented in log 10 transformation. MetaQuad2: total runtime of MetaQuad and cellsnp. (d) PCoA plot of shared informative variant 6 weeks after antibiotic treatment.

References

    1. Alcock BP, Raphenya AR, Lau TTY et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res 2019;48:D517–25. - PMC - PubMed
    1. Ankerst M, Breunig MM, Kriegel H-P. et al. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec 1999;28:49–60.
    1. Bi D, Zhu Y, Gao Y. et al. Profiling Fusobacterium infection at high taxonomic resolution reveals lineage-specific correlations in colorectal cancer. Nat Commun 2022;13:3336. - PMC - PubMed
    1. Buchfink B, Reuter K, Drost H-G. et al. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021;18:366–8. - PMC - PubMed
    1. Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. 2014. https://www.osti.gov/servlets/purl/1241166