Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 16;20(10):e0334009.
doi: 10.1371/journal.pone.0334009. eCollection 2025.

SARS-CoV-2 sequencing artifacts associated with targeted PCR enrichment and read mapping

Affiliations

SARS-CoV-2 sequencing artifacts associated with targeted PCR enrichment and read mapping

Kirsten Maren Ellegaard et al. PLoS One. .

Abstract

Protocols and pipelines for SARS-CoV-2 genome sequencing were rapidly established when the COVID-19 outbreak was declared a pandemic. The most widely used approach for sequencing SARS-CoV-2 includes targeted enrichment by PCR, followed by shotgun sequencing and reference-based genome assembly. As the continued surveillance of SARS-CoV-2 worldwide is transitioning towards a lower level of intensity, it is timely to re-visit the sequencing protocols and pipelines established during the acute phase of the pandemic. In the current study, we have investigated the impact of primer scheme and reference genome choice by sequencing samples with multiple primer schemes (Artic V3, V4.1 and V5.3.2) and re-processing reads with multiple reference genomes. We have also analysed the temporal development in ambiguous base calls during the emergence of the BA.2.86.x variant. We found that the primers used for targeted enrichment can result in recurrent ambiguous base calls, which can accumulate rapidly in response to the emergence of a new variant. We also found examples of consistent base calling errors, associated with PCR artifacts and amplicon drop-out. Similarly, misalignments and partially mapped reads on the reference genome resulted in ambiguous base calls, as well as defining mutations being omitted from the assembly. These findings highlight some key limitations of using targeted enrichment by PCR and reference-based genome assembly for sequencing SARS-CoV-2, and the importance of continuously monitoring and updating primer schemes and bioinformatic pipelines.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Mapped read coverage for samples sequenced with Artic V4.1 in a region with frequent ambiguous base calls.
Per base coverage is shown as individual lines for each sample. Upper panel displays coverage for 17 samples which generated ambiguous base calls at positions 14,960, 15,510 and 15,521, while lower panel displays coverage for 17 samples which did not generate ambiguous base calls at these positions (accession numbers detailed in S2 Table). Positions on the x-axis correspond to the Wuhan-Hu-1 reference genome. The primer binding sites of Artic V4.1 are highlighted in blue for pool 1 and in green for pool 2, with amplicon number and orientation shown at the top of the highlighted region. The region with increased coverage for amplicon 49 and 51 is highlighted in red, with the three positions having frequent ambiguous base calls shown as vertical red lines.
Fig 2
Fig 2. Mapped read coverage for 14 samples sequenced with Artic V5.3.2, when mapped against two different reference genomes.
Per base coverage is shown as individual lines for each sample (accession numbers detailed in S2 Table). Upper and lower panels display coverage when mapping against the BA2 consensus and Wuhan-hu-1 reference genomes, respectively. Positions on the x-axis correspond to the Wuhan-Hu-1 reference genome. The region with differential mapping is highlighted in red. A deletion (relative to Wuhan-Hu-1) responsible for a second region without coverage is highlighted in green.
Fig 3
Fig 3. Development in prevalence of frequent ambiguous base calls over time, during the take-over of the BA.
2.86.x variant. The x-axis denotes week numbers from end of 2023 to beginning of 2024. Upper panel displays the proportion of samples classified as BA.2.86.x, while lower panel displays the proportion of each of 8 frequent ambiguous base calls for the same time period. The ambiguous base calls are named as per Nextclade, with the letter indicating the IUPAC ambiguity code and the number indicating the position relative to the Wuhan-Hu-1 reference genome.

References

    1. Ghebreyesus, Tedros Adhanom. WHO media briefing [Internet]. https://www.who.int/director-general/speeches/detail/who-director-genera...
    1. Carabelli AM, Peacock TP, Thorne LG, Harvey WT, Hughes J, COVID-19 Genomics UK Consortium, et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nat Rev Microbiol [Internet]. 2023 Jan 18 [cited 2025 Jun 17]. https://www.nature.com/articles/s41579-022-00841-7 - PMC - PubMed
    1. Markov PV, Ghafari M, Beer M, Lythgoe K, Simmonds P, Stilianakis NI, et al. The evolution of SARS-CoV-2. Nat Rev Microbiol. 2023;21(6):361–79. doi: 10.1038/s41579-023-00878-2 - DOI - PubMed
    1. Houldcroft CJ, Beale MA, Breuer J. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol. 2017;15(3):183–92. doi: 10.1038/nrmicro.2016.182 - DOI - PMC - PubMed
    1. Quick J, Grubaugh ND, Pullan ST, Claro IM, Smith AD, Gangavarapu K, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc. 2017;12(6):1261–76. doi: 10.1038/nprot.2017.066 - DOI - PMC - PubMed