Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 18;30(1):1-15.
doi: 10.1261/rna.079557.122.

Motifs in SARS-CoV-2 evolution

Affiliations

Motifs in SARS-CoV-2 evolution

Christopher Barrett et al. RNA. .

Abstract

We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of coevolving sites in the alignment (motifs), identified via coevolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the coevolutionary coupling of pairs and manifest as coevolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness, and robustness. Lastly, we study how well each lineage is classified by such a system.

Keywords: SARS-CoV-2; coevolution; genomic surveillance; relational structure; site motif.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Coevolutionary pattern in an MSA (A) and the induced relational structure (B).
FIGURE 2.
FIGURE 2.
Evolution of the relational structure of the SARS-CoV-2 genome in the UK. (A) The relative frequency of key lineages (Alpha and Delta + AY.*) from October week 3, 2020 to October week 4, 2021 in the UK. (B) The relational structure of SARS-CoV-2 at: (1) November week 3, 2020, when the relative frequency of Alpha is <5%; (2) May week 1, 2021, when Delta + AY.* emerges and competes with Alpha; (3) September week 3, 2021, when the relative frequency of Delta + AY.* is >95%. This analysis only considers relational structures with a minimum motif size of five and we only display the three largest motifs.
FIGURE 3.
FIGURE 3.
Accuracy of alerts across different variants and geographic locations. The x-axis denotes the variants considered, while the y-axis denotes the countries. The color of each dot represents the accuracy of our alert system with respect to the particular variant at the corresponding location. An empty cell indicates no alert is issued for that country lineage pair.
FIGURE 4.
FIGURE 4.
Timeliness of alerts across the globe. We illustrate the portion of timely first alerts (blue) versus nontimely first alerts (red).
FIGURE 5.
FIGURE 5.
Timeliness of alerts across lineages. Lineages that have been issued at least one relevant alert over our study period (gray box). We illustrate the portion of timely first alerts (blue) versus nontimely first alerts (red) for each of these lineages. The associated percentage is the ratio of timely first alerts to the total number of first alerts. The lineages are organized in a hierarchical tree structure according to their phylogenetic relations. Note that the 21 lineages considered in the study are somewhat independent in terms of overlaps of characteristic mutations.
FIGURE 6.
FIGURE 6.
Alerts and robustness with respect to different sample sizes at three different times (t1: November week 4, 2020 to December week 1, 2020, t2: November week 4, 2021 to December week 1, 2021, t3: January week 2, 2022 to January week 3, 2022). In each figure, on the z-axis we display the variance of the random variable X (X = 1 for issuance and X = 0, otherwise) associated to a lineage-sample size pair.
FIGURE 7.
FIGURE 7.
ROC curves across the globe. The x-axes and y-axes represent FPRs and TPRs, respectively.
FIGURE 8.
FIGURE 8.
The hierarchical structure of the 28 lineages and sublineages monitored by the BV-BRC SARS-CoV-2 tracking system (Bacterial and Viral Bioinformatics Resource Center 2022). This structure reflects their evolutionary trajectories and relationships (Hadfield et al. 2018; Hodcroft 2021). The lineages selected for our study (green) are independent in this hierarchy structure.

References

    1. Aguilar D, Oliva B, Buslje CM. 2012. Mapping the mutual information network of enzymatic families in the protein structure to unveil functional features. PLoS One 7: e41430. 10.1371/journal.pone.0041430 - DOI - PMC - PubMed
    1. Bacterial and Viral Bioinformatics Resource Center. 2022. Bacterial and viral bioinformatics resource center. https://www.bv-brc.org/. Last accessed Dec. 15, 2022.
    1. Bandelt H-J, Dress AW. 1992. Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Mol Phylogenet Evol 1: 242–252. 10.1016/1055-7903(92)90021-8 - DOI - PubMed
    1. Barrett C, Bura AC, He Q, Huang FW, Li TJ, Waterman MS, Reidys CM. 2020. Multiscale feedback loops in SARS-CoV-2 viral evolution. J Comput Biol 28: 248–256. 10.1089/cmb.2020.0343 - DOI - PubMed
    1. Barrett CL, Huang FW, Li TJ, Warren AS, Reidys CM. 2022. Rapid threat detection in SARS-CoV-2. medRxiv 10.1101/2022.08.05.22278480 - DOI

LinkOut - more resources