Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 20;8(1):1624.
doi: 10.1038/s42003-025-09007-6.

Calculating fast differential genome coverages among metagenomic sources using micov

Affiliations

Calculating fast differential genome coverages among metagenomic sources using micov

Yuhan Weng et al. Commun Biol. .

Abstract

Breadth of coverage, the proportion of a reference genome covered by at least one sequencing read, is critical for interpreting metagenomic data, informing analyses from genome assembly to taxonomic profiling. However, existing tools typically summarize coverage breadth at the whole-genome or aggregate-sample level, missing informative variation along genomes and between sample groups. Here we introduce MIcrobiome COVerage (micov), a tool that computes and compares per-sample breadth of coverage across many genomes and samples. micov offers two key advances: (1) rapid cumulative coverage breadth calculations specific to each sample type, and (2) detection of differential coverage breadth along genomes. Applying micov to three metagenomic datasets, we show that it identifies a genomic region in Prevotella copri that explains variation in community composition independent of host country of origin, uncovers dietary association with a partially annotated region in an uncharacterized Lachnospiraceae genome, enabling hypothesis generation for genes of unknown function, and improves sensitivity in low-biomass settings by detecting a single genomic copy of enteropathogenic Escherichia coli (EPEC) in wastewater and distinguishing Mediterraneibacter gnavus across specimen types.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare the following competing interests: D.M. is a consultant for BiomeSense, Inc., has equity, and receives income. E.K. is the managing director of Clarity Genomics. M.O.D. has equity in GenCirq. K.C. has research grant support from Phathom Pharmaceuticals. A.B. is a founder of Guilden Corporation and is an equity owner. R.K. is a scientific advisory board member and consultant for BiomeSense, Inc., has equity, and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a consultant for DayTwo, and receives income. He has equity in and acts as a consultant for Cybele. He is a co-founder of Biota, Inc., and has equity. J.H. is a co-founder of GenCirq Inc., which focuses on cancer therapeutics. He is on the Board of Directors and has equity in GenCirq. His spouse is employed part-time for the bookkeeping and to support employees with Human Resources. The terms of these arrangements have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic of the micov workflow.
Cumulative coverage represents the total coverage breadth achieved when samples are ordered by increasing individual coverage breadth and added sequentially. This display helps assess whether coverage continues to accumulate across a sample group, or instead plateaus early to suggest that only a small part of the genome is represented in the sample. Position plots display coverage patterns across a reference genome. After cumulative coverage plots are generated, a Kolmogorov-Smirnov (KS) test is conducted to quantify differences between chosen sample groups, especially when visual inspection is challenging due to overlapping curves. Key columns of the output include sample groups being compared, KS statistic, and p-value. For genomic region variation, the genome is divided into N bins, and regions are ranked based on the standard deviation (std) of sample hits across groups. Key columns of the output include the ranking, genome id, start and end position of the genomic region, and standard deviation of sample hits (Methods). The figure is created in BioRender. Knight Lab (2024). https://BioRender.com/.
Fig. 2
Fig. 2. micov detects phenotypically relevant strain variation, captures changes in genome abundance at the level of a single genomic copy in wastewater, and exhibits sensitive detection in low-biomass specimens.
a A scaled position plot of P. copri in human gut microbiome samples collected from subjects in the US/UK/Mexico stratified by presence/absence of region PC351. Sample groups are ordered by increasing sample size. Grey dotted gridlines are added as a visual esthetic to help understand the data relationship to the genome coordinates on the y-axis; b Coverage presence in this region is associated with greater overall genome coverage, with supporting Kolmogorov-Smirnov statistics. Notably, overall coverage was not significantly different between the US and UK for individuals containing the region (KS test, stat=0.17, p = 0.0711), nor was it if they both lacked the region (KS test, stat=0.12, p = 0.0434; n.s. if corrected); c Common high effect size variables, and per-sample characterization of region presence/absence, were tested with PERMANOVA against Weighted UniFrac;PCoAs of the weighted UniFrac distances colored by the region (d) and colored by country (e); f A receiver operator curve for a nested cross validated Random Forest classifier predicting presence/absence of PC351; g Coverage for region L682 in the Lachnospiraceae genome exhibiting differential coverage related to the diversity of plant consumption; h Detection of enteropathogenic E. coli (EPEC) at increasing levels of genome copies spiked into untreated wastewater (Methods). All spike-in levels show statistically significant elevated cumulative coverage levels compared to the background. A low background amount of E. coli is expected in wastewater. Only samples with non-zero EPEC coverage coverage (n = 562) are shown; i The cumulative coverage of M. gnavus from different tissue types surgically collected from Crohn’s Disease patients; j with supporting Kolmogorov-Smirnov statistics. The set of statistics shown are those which reported an uncorrected p-value below 0.05, with correction by the Bonferroni procedure; asterisks denote corrected p-values below 0.05. All Kolmogorov-Smirnov and PERMANOVA tests are in Supplementary Table 2.

References

    1. Hakim, D. et al. Zebra: static and dynamic genome cover thresholds with overlapping references. mSystems7, e0075822 (2022). - DOI - PMC - PubMed
    1. Acheampong, D. A. et al. CAIM: coverage-based analysis for identification of microbiome. Brief Bioinform.25, bbae424 (2024). - PMC - PubMed
    1. Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol.39, 727–736 (2021). - DOI - PMC - PubMed
    1. McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems3, e00031–18 (2018). - PMC - PubMed
    1. Blanco-Míguez, A. et al. Extension of the Segatella copri complex to 13 species with distinct large extrachromosomal elements and associations with host conditions. Cell Host Microbe. 31, 1804–1819.e9 (2023). - DOI - PMC - PubMed