Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 24:9:85.
doi: 10.12688/wellcomeopenres.20704.2. eCollection 2024.

Phylogenetic signatures reveal multilevel selection and fitness costs in SARS-CoV-2

Affiliations

Phylogenetic signatures reveal multilevel selection and fitness costs in SARS-CoV-2

Vinicius Bonetti Franceschi et al. Wellcome Open Res. .

Abstract

Background: Large-scale sequencing of SARS-CoV-2 has enabled the study of viral evolution during the COVID-19 pandemic. Some viral mutations may be advantageous to viral replication within hosts but detrimental to transmission, thus carrying a transient fitness advantage. By affecting the number of descendants, persistence times and growth rates of associated clades, these mutations generate localised imbalance in phylogenies. Quantifying these features in closely-related clades with and without recurring mutations can elucidate the tradeoffs between within-host replication and between-host transmission.

Methods: We implemented a novel phylogenetic clustering algorithm ( mlscluster, https://github.com/mrc-ide/mlscluster) to systematically explore time-scaled phylogenies for mutations under transient/multilevel selection. We applied this method to a SARS-CoV-2 time-calibrated phylogeny with >1.2 million sequences from England, and characterised these recurrent mutations that may influence transmission fitness across PANGO-lineages and genomic regions using Poisson regressions and summary statistics.

Results: We found no major differences across two epidemic stages (before and after Omicron), PANGO-lineages, and genomic regions. However, spike, nucleocapsid, and ORF3a were proportionally more enriched for transmission fitness polymorphisms (TFP)-homoplasies than other proteins. We provide a catalog of SARS-CoV-2 sites under multilevel selection, which can guide experimental investigations within and beyond the spike protein.

Conclusions: This study provides empirical evidence for the existence of important tradeoffs between within-host replication and between-host transmission shaping the fitness landscape of SARS-CoV-2. This method may be used as a fast and scalable means to shortlist large sequence databases for sites under putative multilevel selection which may warrant subsequent confirmatory analyses and experimental confirmation.

Keywords: Molecular evolution; SARS-CoV-2; genetic clustering; mutation; natural selection; phylogenetic analysis; transmission fitness; within-host evolution.

Plain language summary

Viral mutations can potentially carry a transient advantage, being simultaneously favourable for replication within hosts (e.g. by evading host immune responses) and deleterious to transmission (e.g. by having reduced cell binding). To identify such mutations, called transmission fitness polymorphisms (TFPs), we developed a clustering algorithm entitled mlscluster that computes clade-level statistics based on the number of descendants, persistence times, and growth rates of clades carrying a specific mutation in comparison with their immediate sisters without the mutation, which usually are different than expected in the presence of such TFPs. We then applied it to a representative SARS-CoV-2 time-scaled tree with >1 million whole-genome sequences from England. Our statistical analysis suggested approximately constant levels of transient selection across waves driven by very distinct variants. It also showed that genomic regions of known functional significance such as spike, nucleocapsid, and ORF3a were enriched for TFPs. This is the one of the first studies to characterise SARS-CoV-2 recurrent mutations potentially under multilevel selection, providing empirical evidence for the existence of important tradeoffs in selection between intrahost replication and inter-host transmission. Therefore, it provides target mutations for realistic coalescent-based modelling and laboratory-based investigations of their impacts and mechanisms of interaction with human cells.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Schematic view of the tree-based clustering algorithm implementation and analytic pipeline.
( A) Main notations, parameters, and respective statistic formulas that are computed by mlscluster ( https://github.com/mrc-ide/mlscluster) for sister clades of the time-scaled phylogeny. ( B) Analysis workflow with main steps from input data to TFP inference.
Figure 2.
Figure 2.. Spatiotemporal distribution of the SARS-CoV-2 sequences from England included in this study during the investigated period (June 2020 to April 2022).
Main plot: Monthly-stratified frequency of the sequences stacked by major PANGO-lineage. Inset plot: Proportion of included sequenced cases across adm2 regions in England during the investigated period for 77% of the samples with unambiguous adm2-level assignments.
Figure 3.
Figure 3.. Frequency of SARS-CoV-2 TFP-homoplasies per genomic region considering all cluster thresholds and the more reliable threshold of 2%.
( A) Count of TFP-homoplasic sites for all SARS-CoV-2 proteins across the 10 different cluster thresholds ranging from the more (0.25%) to the less stringent (25%). ( BE) Count of TFP-homoplasies per genomic region for two different time periods and considering threshold = 2%. ( B) Non-normalised counts per lineage for timeframe pre-Omicron (June 2020 to mid-November 2021). ( C) Normalised counts per lineage (divided by genomic size) for the same period as ( B). ( D) Non-normalised counts for the timeframe including Omicron (June 2020 to end of April 2022). ( E) Normalised counts for the same period as ( D).
Figure 4.
Figure 4.. TFP-homoplasy identification compared to sites identified as under positive selection.
Sites are compared across different major lineages. ( A) Comparison of top 30 identified sites under multilevel selection by our tree-based clustering approach for the whole-period (including Omicron) for cluster threshold = 2% against the HyPhy analysis , also presenting concordant results (intersection) between both methods. ( B) Bubble plot of TFP-homoplasy frequencies attributed to different major PANGO-lineages. The HyPhy analysis only contained the identified sites and not the underlying amino acid replacements. The actual mutations and further annotations are presented in Table 1.
Figure 5.
Figure 5.. Frequency of identified TFP-homoplasies alongside genomic regions with major functional significance and normalised counts for cluster threshold = 2% and period including Omicron.
( A) Spike. ( B) Nucleocapsid. ( C) ORF3a. TFPs are coloured by major PANGO-lineage and annotated if frequency > 2.

Similar articles

Cited by

References

    1. Kimura M: The neutral theory of molecular evolution.Cambridge: Cambridge University Press,1983; [cited 2021 May 27]. 10.1017/CBO9780511623486 - DOI
    1. Kistler KE, Bedford T: An atlas of continuous adaptive evolution in endemic human viruses. Cell Host Microbe. 2023;31(11):1898–1909. e3. 10.1016/j.chom.2023.09.012 - DOI - PubMed
    1. Lemey P, Rambaut A, Pybus OG: HIV evolutionary dynamics within and among hosts. AIDS Rev. 2006;8(3):125–140. - PubMed
    1. Fraser C, Lythgoe K, Leventhal GE, et al. : Virulence and pathogenesis of HIV-1 infection: an evolutionary perspective. Science. 2014;343(6177): 1243727. 10.1126/science.1243727 - DOI - PMC - PubMed
    1. Carlson JM, Schaefer M, Monaco DC, et al. : HIV transmission. selection bias at the heterosexual HIV-1 transmission bottleneck. Science. 2014;345(6193): 1254031. 10.1126/science.1254031 - DOI - PMC - PubMed

LinkOut - more resources