Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May;641(8061):182-191.
doi: 10.1038/s41586-025-08781-x. Epub 2025 Apr 23.

Brief antibiotic use drives human gut bacteria towards low-cost resistance

Affiliations

Brief antibiotic use drives human gut bacteria towards low-cost resistance

Eitan Yaffe et al. Nature. 2025 May.

Abstract

Understanding the relationship between antibiotic use and the evolution of antimicrobial resistance is vital for effective antibiotic stewardship. Yet, animal models and in vitro experiments poorly replicate real-world conditions1. To explain how resistance evolves in vivo, we exposed 60 human participants to ciprofloxacin and used longitudinal stool samples and a new computational method to assemble the genomes of 5,665 populations of commensal bacterial species within participants. Analysis of 2.3 million polymorphic sequence variants revealed 513 populations that underwent selective sweeps. We found convergent evolution focused on DNA gyrase and evidence of dispersed selective pressure at other genomic loci. Roughly 10% of susceptible bacterial populations evolved towards resistance through sweeps that involved substitutions at a specific amino acid in gyrase. The evolution of gyrase was associated with large populations that decreased in relative abundance during exposure. Sweeps persisted for more than 10 weeks in most cases and were not projected to revert within a year. Targeted amplification showed that gyrase mutations arose de novo within the participants and exhibited no measurable fitness cost. These findings revealed that brief ciprofloxacin exposure drives the evolution of resistance in gut commensals, with mutations persisting long after exposure. This study underscores the capacity of the human gut to promote the evolution of resistance and identifies key genomic and ecological factors that shape bacterial adaptation in vivo.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Extended Data Figure 1.
Extended Data Figure 1.. Sampling days.
The planned and actual sampling day for the 960 samples presented in this study. Shown is the actual sampling day (x-axis, relative to start of antibiotics) and subject (y-axis), colored by planned sampling day. Antibiotics were taken on days 0-4. When a planned sample was not available (not collected by subject or had technical issues in the extraction and sequencing process) all subject samples were shifted, while minimizing temporal changes. 76% of the actual samples were exactly as planned and 94.6% were up to 3 days away from the planned sampling day.
Extended Data Figure 2.
Extended Data Figure 2.. Overview of the PolyPanner workflow.
Each community (i.e., subject) is processed independently. Assembler) Sequenced DNA libraries from temporal samples are pooled and assembled. Mapper) Each DNA library is mapped to the assembly, generating 1-nt coverage profiles. Breaker) Contigs are refined into segments with consistent coverage profiles. Binner) Refined contigs are binned into genome bins. Trimmer) Genome bins are trimmed based on coverage profiles. Caller) Alignment mismatches of reads are counted for each genomic position, sequencing errors are modeled and removed, and variants are called in the remaining segregating positions. Inspector) Variants are classified according to coverage profiles, outputting dynamic variants. Final output is a set of genomes, each associated with set of dynamic variants; i.e., genuine polymorphic sites.
Extended Data Figure 3.
Extended Data Figure 3.. Association between response trends and microbial taxonomy.
Populations were clustered into 20 response trends (shown on top) and were associated with genera using GTDB-tk (shown on right). Only families associated with at least 10 populations are shown. The matrix (middle) is colored according to the enrichment of the number of populations, relative to an expected value assuming taxonomy and response trends are independent. Numbers in squares indicate the number of populations. Marginal number of populations colored in shades of red. Families differ in their response trends. For example, most Akkermansiaceae populations are blooming while most Enterobacteriaceae populations are dropping during the disturbance.
Extended Data Figure 4.
Extended Data Figure 4.. Identification of dynamic variants.
We classified each variant based on 4 temporal read-coverage vectors: the variant vector rvar (reads supporting the variant), the local vector rlocal (reads supporting any variant at the coordinate), the complement vector rcomp (rlocalrvar), and the regional vector rregion (calculated over a 2kb region). a) Examples of genome configurations for 3 scenarios: a mutation in a duplicated gene (green segments) results in paralog variants (left); a mutation in a sequence that is shared by two species (green segments) results in ortholog variants (middle); a mutation that separate two conspecific strains results in polymorphic variants (right). The focal variant is colored black and an alternative variant is colored white. b) Read coverage (y-axis) over a genomic region surrounding the variant site (x-axis) for 6 sampling days. The genomic source of the reads is depicted by different shades of gray (originating from strains of species 1) and blue (species 2), and the variants are colored black and white as in panel A. Regional coverage is depicted with red dashed lines. c) The variant is ruled-out as a putative paralog variant if rvar and rlocal are linked, and is ruled-out as a putative ortholog variant if rcomp and rregional are linked. It is classified as dynamic if it passes both spurious tests and if rlocal and rregional are linked. Regional vectors (red) are transformed from raw read counts to read densities (reads per bp) for visualization purposes (actual tests use raw read counts). Linkage is assessed in all cases using Pearson’s chi-square test.
Extended Data Figure 5.
Extended Data Figure 5.. Variant detection limits.
We calculated the detection limit of polymorphic variants as a function of coverage trajectories. In each of 5 scenarios (A-E) a variant starts with a frequency of 0 and goes up to f in the last m samples out of a total of n samples (‘m/n’ is specified on left). Plotted on in middle are PolyPanner p-values (paralog variant test in green, ortholog variant test in orange, P=0.01 in a dashed line), as a function of the total x-coverage (number of reads aligned to variant across all samples). The variant detection limit (P<0.01 for both tests) is denoted by ‘D’. Shown on the right are matching variant coverages, for the case of a total x-coverage of D. Shown are the variant vector (red), complement vector (blue) and local vector (gray), as a function of the sample (x-axis). A lower variant frequency or less samples with the variant result in a higher detection limits (B and C). Less samples lower the detection limit (D and E). For example, in scenario E a variant that reaches a frequency of 0.2 in 2 out of 4 samples can be detected if the genome x-coverage is above 1072x. To put this in a practical context, if a community is sequenced with 100 million read pairs, and a population has an abundance of 20%, that would result in an x-coverage of 1000x (assuming read pairs are 200bp long and all genomes are 4Mbp).
Extended Data Figure 6.
Extended Data Figure 6.. Performance on simulated communities.
Temporal sequencing data were simulated for 40 communities, divided into 4 groups according to the density of introduced mutations (1,10,100 or 1000 per genome), with 10 random communities per group. For each community, 30 genomes from the Bacteroidetes and Firmicutes phylum were randomly selected from the proGenomes database. For each genome, mutations were introduced into a simulated minor strain resulting in a ground truth of known genuine polymorphic variants. The mutation type was randomly selected among single nucleotide substitutions, short insertions and deletions (1-12bp), inversions with breakpoints that were 1000bp apart, and long insertions and deletions that involved 1000bp of DNA. Each community had 16 temporal samples, with the minor strain absent in samples 1-8 and at a frequency of 80% in samples 9-16. Genomes had a random x-coverage ranging 0-2000 reads per bp. In all panels, genomes are stratified by their x-coverage (x-axis) and the number of introduced mutations (colors). a) From left to right: Median density of spurious variants (i.e., variants not associated with introduced mutations), percent of variants misclassified as dynamic out of all variants. b) Percent of introduced mutations that were correctly detected for substitutions, short insertions and deletions. c) Percent of introduced mutations that were correctly detected for inversion breakpoints, long insertions and long deletions.
Extended Data Figure 7.
Extended Data Figure 7.. Variant classification.
Over 65 million variants detected across 60 subjects were classified as ‘near margin’ (up to 200bp from breakpoint or contig edge), ‘regional deviation’ (regional and local coverage in disagreement), ‘paralog’ (high-copy number or non-significant change in variant frequency over time), ‘ortholog’ (due to homology between non-assembled species), ‘paralog or ortholog’, and ‘dynamic’ (polymorphic with a non-constant frequency over time). For each variant, we test two null hypotheses, i.e., that it is a paralog or an ortholog (both considered spurious variants). Note that variants with low coverage may be misclassified as spurious, even if they are true polymorphisms. a) A pie chart of variant classes for all detected variants. b) Example of the classification of a dynamic variant. Top-left shows the regional coverage vector (gray), the local coverage (black), the variant coverage vector (red), and the complement coverage vector (blue) across the 16 temporal sample days (x-axis), with coverages shown in log-scale (y-axis). Top-right shows the variant frequency over the 16 days. Bottom 3 panels show x-coverage profiles for a 2kb genomic region surrounding the variant for 3 timeframes: baseline, treated, post-antibiotics (day intervals defined as in Figure 1a). Focal variant marked with a vertical black tick. The region contains 5 additional dynamic variants. c) Example of a densely packed stretch of spurious variants arising from putative sequence homology of within and between genomes. Spurious variants colored by their class. Variants in panels B-C colored according to their class, as specified in panel A.
Extended Data Figure 8.
Extended Data Figure 8.. KEGG orthology and taxonomic identity.
Variants associated with a significantly enriched KO were stratified by KO (y-axis) and by taxonomic family (x-axis). Shown for each KO (left to right) is the number of associated variants, number of populations, number of subjects, a log-transformed transformed P-value, and the enrichment ratio. Shown for each family (top to bottom) is the number of associated variants, the class and the phylum (color legends in top left). The matrix (middle) shows the observed number of variants for each KO-family combination, colored according to fold enrichment of the observed number of variants over the number expected if KOs and families were independent.
Extended Data Figure 9.
Extended Data Figure 9.. gyrA:83 baseline amino acids and taxonomic identity.
The amino acid at gyrA:83 of gyrA was resolved for 3896 genomes. Genomes were stratified by amino acid (top) and genera (right). The matrix (middle) shows the observed number of genomes with gyrA:83/genera combination, colored according to fold enrichment of the observed number of genomes over the number expected if taxa and amino acid identity were independent. Clostridia families are split between the use of serine (e.g., Lachnospiraceae) and adenine (e.g., Oscillospiraceae). Actinobacteria and Bacteroidota combine serine and other amino acids.
Extended Data Figure 10.
Extended Data Figure 10.. Substitutions and taxonomic identity.
Shown are 59 substitutions at gyrA:83, stratified by source and target codons (left) and species (top). For each species, shown are the number of genomes with specific pre-exposure amino acids at gyrA:83, total number of genomes and number of genomes with a substitution. The matrix (bottom) shows the observed number of substitutions for gyrA:83/genera combinations, colored by enrichment (as in Extended Data Fig. 9). There were 26 species in which the substituting amino acid (highlighted with a red rectangle) was present prior to the exposure (highlighted with blue rectangle). For example, Bifidobacterium bifidum had a single substitution to valine during exposure (A83V) and valine was also present before the exposure (1/9 genomes). The association between substituting and pre-existing amino acids indicate baseline amino acid distributions may reflect prior exposure and resistance. Collinsella sp. represents unclassified species of the Collinsella genus.
Extended Data Figure 11.
Extended Data Figure 11.. Pre-exposure associated with elevated abundance during exposure.
Shown are temporal abundance trajectories for species that had at least 10 representatives across subjects, colored by the identity of the amino acid at position gyrA:83 at baseline, with substitutions during the study overlaid with dotted lines. Note how for most species susceptible populations (with S, A, G or T) are relatively lower in abundance relative to resistant populations (all other amino acids). See for example Phocaeicola vulgatus, where resistant populations (with F) increase during the exposure, while susceptible populations (with S) keep low.
Extended Data Figure 12.
Extended Data Figure 12.. The gyrA amplification design.
Amplification was done in 3 rounds. The first round (2 PCR cycles) used gene-specific primer pairs (19-24nt on each side) to add unique molecular identifiers (UMI, 6nt on each side), Illumina adapter sequences and an exogenous ‘stub’ sequence. The second round (15 PCR cycles) amplified molecules using stub primers to reduce complexity and mitigate off-targets on the next round. The third round (12 PCR cycles) sheared stub sequences and added indexed Illumina adapters to generate Illumina-compatible molecules. Random UMI sequences are depicted using Ns. Gene-specific primer sequences are depicted using Xs and were 19-24nt long.
Figure 1.
Figure 1.. Population-level responses to ciprofloxacin.
a) Abundance trajectories of all 5665 metagenome-assembled genomes (MAGs) were each normalized relative to their respective mean and clustered using k-means into 100 response clusters or trends. Trends are organized vertically using hierarchical clustering, with the number of populations per cluster shown by a color bar on the left and the number of populations per cluster group written in parentheses. The mean-centered abundance of each cluster across 16 temporal samples is shown horizontally, with ciprofloxacin course days indicated in red along the axis (shifted by a day to reflect transit time). b) Scatter plots comparing the baseline pre-exposure population abundances to three later timeframes. Each scatter plot depicts for each population the baseline abundance (x-axis, average of days −2 to 0) vs. one of three timeframes (during the height of disturbance at days 3-8, post exposure at days 10-28, and on the last day of sampling, day 77). A pseudo count of 0.0001% was added to each count for visualization purposes. Vertical/horizontal red lines represent a detection threshold of 0.005%. The percentages of populations within quadrants are shown, as are Spearman correlation coefficients between populations detected in both timeframes.
Figure 2.
Figure 2.. Variants, strains, linkage groups and convergent evolution analysis.
a) Number of population genomes (y-axis) as a function of genome x-coverage (x-axis). Populations with sufficient x-coverage to detect sweeps were colored red if dynamic variants were detected and blue otherwise. Similarly, if x-coverage was not sufficient populations were colored pink if variants were detected and light blue otherwise. b) Dynamic variants clustered using k-means into four clusters according to the relative abundance of the major and minor variant alleles. Shown is minor allele frequency (top) and relative allele abundance (bottom) for cluster centroids with the percent of variants in cluster in parentheses. c) The number of genomes (y-axis) stratified by the number of sweeping variants (x-axis) for 647 populations that had one or more sweeps. d) Number of genomes containing dynamic variants (y-axis) as a function of the number of inferred strains (x-axis). e) dN/dS ratios (y-axis) as a function of the linkage group (LG) size (x-axis), stratified by variant type. f) Number of genomes (y-axis) stratified by sweeps per genome (x-axis), defined as the number of unique linkage groups associated with sweeping variants. g) Standard boxplots (rectangles depict the interquartile range (IQR), the median is a line, and whiskers extend to 1.5×IQR; gray circles depict actual data points) of the number of evolving populations per subject, comparing 4 control subjects that did not take antibiotics and 60 subjects taking antibiotics. h) Temporal dynamics of variants and strains of a population of Collinsella (uncharacterized species) in subject BAA. Below is the variant-strain matrix, with 18 bi-allelic polymorphic variant loci (rows) and 4 strains (columns). Variants are named according to genes and are grouped and colored according to LGs (letters a-e). The strain phylogeny (maximum parsimony) is shown on top of the matrix, with linkage groups specified next to edges. Shown at the top is variant read depth (top, colored by LG, gyrA variants in black), frequencies (middle) and strain abundance (bottom, stacked), as a function of days (x-axis). i) Variant weights were aggregated within KEGG Orthology annotations (KOs) of associated genes and compared to a permutation-based null model. A volcano plot depicts weight enrichment ratios over the null model (x-axis) vs. P-values (y-axis). Significant KOs (P<0.05) are colored red. K02469 is gyrA (highlighted with a black circle) and K02621 is parC; other KO descriptions are given in Supplementary Figure 7. Statistical significance was assessed using a permutation test with 1 million permutations, comparing the observed mean difference to a null distribution generated by randomly shuffling sweeping and non-sweeping variants within genomes. The False Discovery Rate (FDR) was estimated to be 0.12 using the Benjamini-Hochberg (BH) method. j) For each significant KO, the dN/dS ratio (x-axis) of the supporting variants is plotted vs. the median linkage group size of the supporting variants (y-axis).
Figure 3.
Figure 3.. Focused selective pressure on gyrA.
a) The number of variants (y-axis) for each amino acid (AA) position within the gyrA gene (x-axis, E. coli numbering). Sweeping and non-sweeping variant counts are colored red and gray respectively. b) Substitutions associated with sweeping variants at gyrA:83 and gyrA:87; letters represent AAs (e.g., ‘S-F’ represents a substitution of serine by phenylalanine). c) Number of genomes containing gyrA according to the baseline AA identity at gyrA:83 before the exposure. d) Exposure-associated substitution probability, stratified by the baseline AA at gyrA:83. Data are presented as mean values +/− SD, and the number of supporting data points is presented in (c). e) Species-specific resistance alleles at gyrA:83 were determined for 31 species based on observed exposure-associated substitutions. The percentage of resistance alleles prior to exposure is shown for 18 species represented by at least 10 populations. Resistance alleles are shown next to the species name. For example, Alistipes putredinis had two resistance alleles at gyrA:83, namely isoleucine (‘I’) and arginine (‘R’), and 12% (5/42) of A. putredinis populations had one of these alleles prior to the exposure. Data are presented as mean values +/− SD. f) Temporal abundance trajectories for 3 species; lines represent subjects and are colored by the identity of the amino acid at position gyrA:83 at baseline. For P. vulgatus, substitutions observed during sampling are overlaid with dotted lines.
Figure 4.
Figure 4.. Abundance-dependent evolution of antibiotic resistance.
a) GyrA evolvability as predicted by a fitted logistic regression model based on 4 variables: the baseline population abundance (average of days −2 to 0, x-axis), the fold-change decline during antibiotics (average of days 1-5, y-axis), and 2 variables for phylum relative to Firmicutes (separate plot for each of three phyla). b) Model-predicted and observed gyrA evolvability (y-axis) as a function of baseline abundances. c) Model-predicted and observed gyrA evolvability (y-axis) as a function of the fold-change decline in population abundances during antibiotics (days 1-5). d) Shown on left are ROC curves (receiver operating characteristic curves) of 9 inferred logistic regression models that predict the probability of a population having a sweeping gyrA variant. Shown on right are the areas under the curve (AUC) of all models. e) Observed (x-axis) vs. model-predicted (y-axis) genus-level gyrA evolvability, for the 21 genera associated with at least 6 populations. Genera are colored red if differences between observed and predicted evolvability were significant (one-sided binomial test with Benjamini-Hochberg correction, p-values shown on figure), and colored gray otherwise. Dashed line depicts the line x=y.
Figure 5.
Figure 5.. Long-term persistence of antibiotic-associated genetic changes.
a) Boxplots (rectangles show the IQR, the median is a vertical line, and whiskers extend to 1.5×IQR; outliers shown as open circles) depicting the distribution of the frequencies of sweeping gyrA variants (left) and non-gyrA variants (right), stratified by time windows (day ranges defined in panel 1A). The number of supporting variants that had a total x-coverage of at least 1 during each of the 4 timeframes are specified in the x-axis labels. b) The percentage of sweeping variants fixated at 0 at baseline (blue) or fixated at a frequency of 1 within the 3 other time windows (red), shown separately for gyrA variants (left) and non-gyrA variants (right). c) Example of a sweeping gyrA variant in subject EAA that was projected to revert below a frequency of 0.01 within 135 days. Circles depict the variant frequency (y-axis) as a function of the day relative to the start of the antibiotics (x-axis). Gray line shows an exponential decay, based on a constant selection coefficient of 0.006 that was inferred from the temporal frequency data, assuming 10 generations per day. d) The percentage of sweeping gyrA variants (left) and sweeping non-gyrA variants (right) that were projected to require more than a year to revert below a frequency of 0.01. Data are presented as mean values +/− SD. e) For sweeping variants that were projected to fall below a frequency of 0.01 within a year, a histogram of the number of months until reversion.
Figure 6.
Figure 6.. Gyrase sweep dynamics.
Each panel shows gyrA allele frequencies (y-axis, logit scale) as a function of day (x-axis) for a single population. Alleles are depicted using different colors, with dashed lines connecting non-sequential samples. Solid circles depict the average frequency of two independent primer pairs (vertical lines connect replicate values), which are colored if the allele was supported by >1 molecule in both primer pairs and white otherwise. Open black circles denote detection limits, defined as the frequency corresponding to a single molecule that supports or does not support an allele, for the lower and upper limits respectively. Allele identities are denoted in E. coli numbering; ‘baseline’ denotes the most common gyrA sequence in days −2 to 0. For Alistipes putredinis, B1 and B2 are two sequences present on days −2 to 0 that have 10 nucleotide differences in the region encoding amino acids 74 through 79.

References

MAIN TEXT REFERENCES

    1. Murray CJ et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. The Lancet 399, 629–655 (2022). - PMC - PubMed
    1. Lipsitch M & Samore MH Antimicrobial Use and Antimicrobial Resistance: A Population Perspective. Emerg. Infect. Dis 8, 347–354 (2002). - PMC - PubMed
    1. Goossens H, Ferech M, Vander Stichele R & Elseviers M Outpatient antibiotic use in Europe and association with resistance: a cross-national database study. The Lancet 365, 579–587 (2005). - PubMed
    1. Sun DS et al. Analysis of multiple bacterial species and antibiotic classes reveals large variation in the association between seasonal antibiotic use and resistance. PLoS Biol. 20, e3001579 (2022). - PMC - PubMed
    1. Marvig RL, Sommer LM, Jelsbak L, Molin S & Johansen HK Evolutionary insight from whole-genome sequencing of Pseudomonas aeruginosa from cystic fibrosis patients. Future Microbiol. 10, 599–611 (2015). - PubMed

ADDITIONAL REFERENCES

    1. Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). - PMC - PubMed
    1. Schmieder R & Edwards R Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets. PLoS ONE 6, e17288 (2011). - PMC - PubMed
    1. Li D, Liu CM, Luo R, Sadakane K & Lam TW MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2014). - PubMed
    1. Li H & Durbin R Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl 25, 1754–1760 (2009). - PMC - PubMed
    1. Kang DD et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). - PMC - PubMed

MeSH terms

LinkOut - more resources