Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2022 Aug 16:2022.08.16.504117.
doi: 10.1101/2022.08.16.504117.

Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns

Affiliations

Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns

Ashley Thommana et al. bioRxiv. .

Update in

Abstract

Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors declare no conflict of interest

Figures

Figure 1.
Figure 1.
Distribution of the 843 samples containg Spike N501Y as a minor variant from October 2020 across the global SARS-CoV-2 phylogeny indicating independent emergence. Background lineages include Alpha/B.1.17 samples highlighted in blue, Gamma/P.1 highlighted in green, Beta/B.1.351 highlighted in purple, Epsilon/B.1.429 highlighted in orange, Iota/B.1.526 highlighted in turquoise, Delta/B.1.617.2 highlighted in grey, and Omicron/BA.1/BA.2 highlighted in dark grey. None of the 834 samples containing Spike N501Y as a minor variant in Octoboer 2020 were present in these highlighted lineages. Non VoC/VoI lineages are not highlighted. 834 samples identified as having the N501Y change present as a minor variant in October 2020 (Table 1) are colored in red. 3,243 total background genomes were included in this analysis.
Figure 2.
Figure 2.
Distribution of the 68 samples containing Spike L452R as a minor variant from September 2020 across the global SARS-CoV-2 phylogeny indicating independent emergence. Background lineages include Alpha/B.1.17 samples highlighted in blue, Gamma/P.1 highlighted in green, Beta/B.1.351 highlighted in purple, Epsilon/B.1.429 highlighted in orange, Iota/B.1.526 highlighted in turquoise, Delta/B.1.617.2 highlighted in grey, and Omicron/BA.1/BA.2 highlighted in dark grey. None of the 68 samples containing Spike L452R as a minor variant in September 2020 were present in these highlighted lineages. Non VoC/VoI lineages are not highlighted. 68 samples identified as having the L452R change present as a minor variant in September 2020 (Table 2) are colored in red. 3,243 background genomes were included in this analysis.
Figure 3.
Figure 3.
Frequency over time of the n=15 VoC/VoI mutations that had a substantial increase as a minor variant prior to a rise as a fixed variant across 411,805 NCBI SRA SARS-CoV-2 samples. The Y-axis is scaled by the maximum count for each particular mutation either as a minor variant or fixed mutation (whichever was higher for each). Dotted lines represent minor variant mutations and solid lines represent fixed mutations. The red solid and dotted lines represent the A23063T/N501Y mutation/change and the blue solid and dotted lines represent the T22917G/L452R mutation/change. The grey lines represent the other 13 VoC/VoI mutations that had a substantial increase as a minor variant prior to a rise as a fixed variant (each is also found in Figure S1).
Figure 4:
Figure 4:
n=15 VoC/VoI mutations that appeared as candidate minor variants prior to becoming fixed variants were mostly associated with the spike protein including on the NTD and RBD protein domains. ‘X’ denotes which lineage(s) each mutation is predominantly found in.

References

    1. Chiara M.; D’Erchia A.M.; Gissi C.; Manzari C.; Parisi A.; Resta N.; Zambelli F.; Picardi E.; Pavesi G.; Horner D.S.; et al. Next Generation Sequencing of SARS-CoV-2 Genomes: Challenges, Applications and Opportunities. Brief. Bioinform. 2021, 22, 616–630, doi: 10.1093/bib/bbaa297. - DOI - PMC - PubMed
    1. Plante J.A.; Liu Y.; Liu J.; Xia H.; Johnson B.A.; Lokugamage K.G.; Zhang X.; Muruato A.E.; Zou J.; Fontes-Garfias C.R.; et al. Spike Mutation D614G Alters SARS-CoV-2 Fitness. Nature 2020, doi: 10.1038/s41586-020-2895-3. - DOI - PMC - PubMed
    1. Korber B.; Fischer W.M.; Gnanakaran S.; Yoon H.; Theiler J.; Abfalterer W.; Hengartner N.; Giorgi E.E.; Bhattacharya T.; Foley B.; et al. Tracking Changes in SARS-CoV-2 Spike: Evidence That D614G Increases Infectivity of the COVID-19 Virus. Cell 2020, 182, 812–827.e19, doi: 10.1016/j.cell.2020.06.043. - DOI - PMC - PubMed
    1. Rambaut A.; Holmes E.C.; O’Toole Á.; Hill V.; McCrone J.T.; Ruis C.; du Plessis L.; Pybus O.G. A Dynamic Nomenclature Proposal for SARS-CoV-2 Lineages to Assist Genomic Epidemiology. Nat. Microbiol. 2020, 5, 1403–1407, doi: 10.1038/s41564-020-0770-5. - DOI - PMC - PubMed
    1. Tracking SARS-CoV-2 Variants Available online: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ (accessed on 22 June 2021).

Publication types