Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 4;7(1):veab006.
doi: 10.1093/ve/veab006. eCollection 2021 Jan.

Alternate primers for whole-genome SARS-CoV-2 sequencing

Affiliations

Alternate primers for whole-genome SARS-CoV-2 sequencing

Matthew Cotten et al. Virus Evol. .

Abstract

As the world is struggling to control the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is an urgency to develop effective control measures. Essential information is encoded in the virus genome sequence with accurate and complete SARS-CoV-2 sequences essential for tracking the movement and evolution of the virus and for guiding efforts to develop vaccines and antiviral drugs. While there is unprecedented SARS-CoV-2 sequencing efforts globally, approximately 19 to 43 per cent of the genomes generated monthly are gapped, reducing their information content. The current study documents the genome gap frequencies and their positions in the currently available data and provides an alternative primer set and a sequencing scheme to help improve the quality and coverage of the genomes.

Keywords: COVID-19; SARS-CoV-2; next generation sequencing; primers.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(updated graphics) Positions of 200 nt gaps across SARS-CoV-2 genomes listed as complete in GISAID. Genomes deposited in September 2020 (n = 38,228) were retrieved from GISAID, sorted by sequencing platform (MinION versus Illumina) and genomes with at least one instance of 200 N were collected. Panel A presents gaps in the first 3,000 MinION-generated genome sequences deposited that contained at least on 200 N motif. Gaps >= 200 nt in each genome are indicated with red bars. The upper panel histogram shows the frequency (in 30 nt bins) of gaps >=200 nt motifs by start position on genomes. Panel B is the same analysis of the first 3,000 Illumina-generated genome sequences in September 2020 that contained at least one 200 N motif.
Figure 2.
Figure 2.
Positions of 200 nt gaps across SARS-CoV-2 genomes stratified by MinION or Illumina, in region nt 19,000 to 24,000. Genomes deposited in September 2020 as ‘complete’ were retrieved from GISAID, sorted by sequencing platform and by the presence of at least one N200 motif. For clarity, only the first 3,000 genomes in each set were plotted. Similar to Figure 1, gaps >= 200 nt in each genome are indicated with red bars. The upper panel histogram shows the frequency (in 30 nt bins) of gaps >= 200 nt motifs by start position on genome, the middle panel plots the positions of ARTIC v.1 primers in the region (pink = forward ‘left’ primers, red = reverse ‘right’ primers). Panel A: MinION-derived genome sequences, Panel B: Illumina-derived genome sequences.
Figure 3.
Figure 3.
Primer design and amplicon layout. Panel A: The two main steps involved in primers generation and selection are shown. Panel B: The layout of the twenty amplicons across the SARS-CoV-2 genome is shown in lower panel. The blue markers indicate target positions in the SARS-CoV-2 genome (NC_045512 used here), the grey bars indicate the resulting amplicon.
Figure 4.
Figure 4.
Testing the primer performance. Panel A: PCR product size after pooling of reaction A and B. Expected sizes of amplicons are from 1,500 bp to 2,093 bp before primer trimming. Panel B: MinION reads after quality control, primer, adapter trimming. Panel C: Reads mapped to SARS-CoV-2 reference genome, before amplicon 2 and 16 primer boosting. Panel D: Reads mapped to SARS-CoV-2 reference genome, after amplicon 2 and 16 primer boosting.
Figure 5.
Figure 5.
Validation of Entebbe primers. Panel A plots the genome yield (fraction of complete genome) as a function of sample Ct. Fraction genome was calculated by number of nonN nucleotides/29,303 (the length, in nt, of NC_045512 reference genome). Each marker represents a sample, red markers indicate 19 samples that failed to yield sufficient DNA for library, 93 that proceeded to library preparation and sequencing (dark blue markers). Panel B is a histogram of the distribution of the 118 sample Cts.

References

    1. Agoti C. N. et al. (2015) ‘ Local Evolutionary Patterns of Human Respiratory Syncytial Virus Derived from Whole-Genome Sequencing’, Journal of Virology, 89: 3444–54. - PMC - PubMed
    1. Alessandrini F. et al. (2020) ‘ Evaluation of the Ion AmpliSeq SARS-CoV-2 Research Panel by Massive Parallel Sequencing’, Genes, 11: 929. - PMC - PubMed
    1. Cotten M. et al. (2013) ‘ Full-Genome Deep Sequencing and Phylogenetic Analysis of Novel Human Betacoronavirus’, Emerging Infectious Diseases, 19: 736–42B. - PMC - PubMed
    1. Cotten M. et al. (2014) ‘ Deep Sequencing of Norovirus Genomes Defines Evolutionary Patterns in an Urban Tropical Setting’, Journal of Virology, 88: 11056–69. - PMC - PubMed
    1. De Maio N. et al. 2020. ‘Issues with SARS-CoV-2 Sequencing Data.’ <https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473> accessed 26 Jan 2021.