Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 23;17(8):e1009849.
doi: 10.1371/journal.ppat.1009849. eCollection 2021 Aug.

Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks

Affiliations

Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks

Katarina M Braun et al. PLoS Pathog. .

Abstract

The emergence of divergent SARS-CoV-2 lineages has raised concern that novel variants eliciting immune escape or the ability to displace circulating lineages could emerge within individual hosts. Though growing evidence suggests that novel variants arise during prolonged infections, most infections are acute. Understanding how efficiently variants emerge and transmit among acutely-infected hosts is therefore critical for predicting the pace of long-term SARS-CoV-2 evolution. To characterize how within-host diversity is generated and propagated, we combine extensive laboratory and bioinformatic controls with metrics of within- and between-host diversity to 133 SARS-CoV-2 genomes from acutely-infected individuals. We find that within-host diversity is low and transmission bottlenecks are narrow, with very few viruses founding most infections. Within-host variants are rarely transmitted, even among individuals within the same household, and are rarely detected along phylogenetically linked infections in the broader community. These findings suggest that most variation generated within-host is lost during transmission.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Within host variation is limited after data quality control.
a. iSNV frequencies in replicate 1 are shown on the x-axis and frequencies in replicate 2 are shown on y-axis. The yellow box highlights low-frequency iSNVs (3–15%), which is expanded out to the right. b. The Ct value is compared to the percent of iSNVs shared between technical replicates. The blue line is a line of best fit to highlight the observed negative trend. c. Distribution of the number of total iSNVs detected per sample. 22 out of 133 samples harbor no iSNVs at all, and the maximum number of iSNVs in a single sample was 11. d. The proportion of iSNVs that were detected at various within-host frequency bins is shown. Error bars represent the variance in the proportion of total within-host iSNVs within that frequency bin across samples in the dataset as calculated by bootstrapping. There was a single stop variant in the entire dataset, so no error bar is shown for the stop category. The solid grey line indicates the expected proportion of variants in each frequency bin under a neutral model.
Fig 2
Fig 2. Shared iSNVs represent homopolymers and common polymorphic sites.
a. The number of iSNVs (y-axis) present within n individuals (x-axis) is shown. 143/184 (77%) of iSNVs are found in only a single sample. 6 iSNVs are shared by at least 10 samples. b. Each iSNV detected in at least 2 samples is shown. Variants that occur within, or 1 nucleotide outside of, a homopolymer region (classified as a span of the same base that is at least 3 nucleotides long) are colored in yellow. Variants that represent the minor allele for variants that were nearly fixed at consensus (annotated here as “Wuhan1 reversions”) are shown in blue, and variants that were both Wuhan1 reversions and occurred in homopolymer regions are colored in purple. c. For each unique iSNV detected within a host, the x-axis represents the number of samples in which that iSNV was detected, and the y-axis represents the number of times it is present on the global SARS-CoV-2 phylogenetic tree. The counts on the phylogenetic tree represent the number of times the mutation arose along internal and external branches. The variants labeled with text are those that are detected at least 5 times within-host and at least 5 times on the phylogeny. Two of the most commonly detected iSNVs, T3037C and T241C (shown as the furthest to the left in panel b), are also frequently detected on the phylogenetic tree.
Fig 3
Fig 3. Variants are not common in consensus sequences or in downstream branches.
a. We traversed the Wisconsin-focused full-genome SARS-CoV-2 phylogeny from root to tip. For each Wisconsin tip for which we had within-host data, we queried whether any of the iSNVs detected in that sample were ever detected in downstream branches at consensus. In this example, the purple tip represents a Wisconsin sample for which we have within-host data. This sample harbors 2 iSNVs, A and B. iSNV A arises on a tip that falls downstream from the starting, purple tip. iSNV B is present on a downstream branch leading to an internal node. Both A and B would be counted as instances in which an iSNV was detected at consensus in a downstream branch. b. In the Wisconsin-specific phylogenetic tree, we applied the metric described in a. Among 110 Wisconsin samples that harbored within-host variation, 93 occurred on internal nodes. Of those, we detect one instance in which a mutation detected as an iSNV in one sequence was detected in a downstream consensus sequence. (C1912T, an iSNV in USA/WI-UW-214/2020, was detected downstream in USA/WI-WSLH-200068/2020.) c. For each iSNV identified in the study (in at least 1 sample), we enumerated the number of times that variant occurred on the global SARS-CoV-2 phylogeny on an internal node (yellow) or on a tip (blue). The results for every variant are shown in S6 Fig. Here, we show only the variants that were detected at least 10 times on the global phylogeny. Each such iSNV is found at internal nodes and tips at a ratio comparable to overall mutations on the tree, except for C28887T, which is enriched on internal nodes (p = 0.028, Fishers’ exact test). * indicates p-value < 0.05.
Fig 4
Fig 4. A quarter of household pairs share more iSNVs than random expected by chance.
a. We modeled the probability that 2 consensus genomes will share x mutations as Poisson-distributed with lambda equal to the number of mutations expected to accumulate in the SARS-CoV-2 genome over 5.8 days [37] given a substitution rate of 1.10 x 10−3 substitutions per site per year [1]. Exploration of how these probabilities change using a range of plausible serial intervals and substitution rates is shown in S8 Fig. The vast majority of genomes that are separated by one serial interval are expected to differ by ≤2 consensus mutations. b. The proportion of random pairs (grey) and putative household transmission pairs (purple) is shown on the y-axis vs. the proportion of iSNVs shared. The dotted line indicates the 95th percentile among the random pairs. Household pairs that share a greater proportion of iSNVs than 95% of random pairs (i.e., are plotted to the right of the dotted line) are considered statistically significant at p = 0.05. iSNVs had to be present at a frequency of ≥3% to be considered in this analysis. c. We assessed the impact of household membership, clade membership, phylogenetic divergence, and geographic distance on the proportion of iSNVs shared between each pair of samples in our dataset. The mean of each estimated coefficient in the combined linear regression model including all predictors is shown on the x-axis, with lines of spread indicating the range of the estimated 95% highest posterior density interval (HPDI).
Fig 5
Fig 5. SARS-CoV-2 transmission bottlenecks in household transmission pairs.
a. “TV plots” showing intersection iSNV frequencies in all 44 donor-recipient pairs using a 3% frequency threshold. The yellow box highlights low-frequency iSNVs (3–10%) and the mauve box highlights high-frequency iSNVs (90–100%). b. Maximum likelihood estimates for mean transmission bottleneck size in individual donor-recipient pairs. Bottleneck sizes could not be estimated for a few pairs (e.g. pairs 5, 10a, 11a, etc) because there were no polymorphic sites detected in the donor. c. Bidirectional comparisons are denoted with an “a” and “b” following the pair number. Combined maximum likelihood estimates across all 44 donor-recipient pairs plotted against variant calling thresholds ranging from 1–20%. The purple line shows combined estimates at each variant calling threshold shown and the mauve band displays the 95% confidence interval for this estimate. The dashed grey line indicates a bottleneck size equal to 1. The vertical yellow band highlights the combined transmission bottleneck size using a 3% variant calling threshold.

References

    1. Duchene S, Featherstone L, Haritopoulou-Sinanidou M, Rambaut A, Lemey P, Baele G. Temporal signal and the phylodynamic threshold of SARS-CoV-2. Virus Evol. 2020;6: veaa061. doi: 10.1093/ve/veaa061 - DOI - PMC - PubMed
    1. Public Health England. Investigation of novel SARS-CoV-2 variant: Variant of Concern 202012/01. GOV.UK; 21 Dec 2020 [cited 2 Feb 2021]. Available: https://www.gov.uk/government/publications/investigation-of-novel-sars-c...
    1. arambaut, garmstrong, isabel. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. 18 Dec 2020 [cited 2 Feb 2021]. Available: https://virological.org/t/preliminary-genomic-characterisation-of-an-eme...
    1. Kemp SA, Collier DA, Datir RP, Ferreira IATM, Gayed S, Jahun A, et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature. 2021. doi: 10.1038/s41586-021-03291-y - DOI - PMC - PubMed
    1. Choi B, Choudhary MC, Regan J, Sparks JA, Padera RF, Qiu X, et al. Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host. N Engl J Med. 2020;383: 2291–2293. doi: 10.1056/NEJMc2031364 - DOI - PMC - PubMed

Publication types