Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 27;37(16):4763-4771.
doi: 10.1021/acs.langmuir.0c02927. Epub 2021 Apr 13.

Highly Accurate Chip-Based Resequencing of SARS-CoV-2 Clinical Samples

Affiliations

Highly Accurate Chip-Based Resequencing of SARS-CoV-2 Clinical Samples

Kendall Hoff et al. Langmuir. .

Abstract

SARS-CoV-2 has infected over 128 million people worldwide, and until a vaccine is developed and widely disseminated, vigilant testing and contact tracing are the most effective ways to slow the spread of COVID-19. Typical clinical testing only confirms the presence or absence of the virus, but rather, a simple and rapid testing procedure that sequences the entire genome would be impactful and allow for tracing the spread of the virus and variants, as well as the appearance of new variants. However, traditional short read sequencing methods are time consuming and expensive. Herein, we describe a tiled genome array that we developed for rapid and inexpensive full viral genome resequencing, and we have applied our SARS-CoV-2-specific genome tiling array to rapidly and accurately resequence the viral genome from eight clinical samples. We have resequenced eight samples acquired from patients in Wyoming that tested positive for SARS-CoV-2. We were ultimately able to sequence over 95% of the genome of each sample with greater than 99.9% average accuracy.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A. The ~30,000 base SARS-CoV-2 genome. B. Zoomed into the N gene of the SARS-CoV-2 genome covering ~2,000 bases. C. Position 28,274-29,533 of the SARS-CoV-2 genome, which is amplified by the CDC N1 primers. D. Three different sense probe sets, each probe set consists of four features synthesized on the genome tiling array to interrogate the middle base position (highlighted in red font). The feature whose sequence is consistent with the reference (NC_045512.2) is highlighted in bold and denoted with an “*.” E. Extracted regions from the tiling array for genome position 29,319-29,321 and 28,322-28,325 illustrating how the feature with the highest intensity is used to call the base at each position in the SARS-CoV-2 genome. F. An image illustrating the resulting confocal scan of the genome tiling array when hybridized to a SARS-CoV-2 sample. Highlighted is one alignment marker which is used for correctly extracting the intensities for each probe set.
Figure 2.
Figure 2.
Development of the maximum likelihood base caller for SARS-CoV-2 genome sequencing using full genome tiling arrays. A. Density plot derived from a 2D histogram of the incorrect calls from all tiling array probe sets including sense and antisense data for a single exposure. This image was constructed by ‘calling’ each base in the genome using all probe sets. With this approach, each base is called twice, once from the sense probe sets and once from the antisense probe sets. The difference and differential of a call is included in the histogram if the base call does not match the reference. Contours indicate a likelihood function proportional to the two-dimensional cumulative sum of the density; the sum is normalized to indicate the fraction of wrong calls whose quality parameters are higher than the given point; higher values indicate a higher likelihood that a call is ‘wrong’. B. Same at A, except the 2D histogram is for the correct calls from all tiling array probe sets. Contours indicate a cumulative sum of the density, normalized to indicate the likelihood that a call is correct. It can be observed that distribution of the difference and differential of ‘correct’ calls is very different from the ‘incorrect’ calls. C. Using the observation from panels A and B, we constructed a function to assign the likelihood that a probe set is calling the correct base for a given position. The dotted contours define the (combined) likelihood that a probe set is correctly calling the correct base, based on the difference and differential score for that probe set. The triangle points on the plot illustrate the different and differential values for probe sets for all variant sites that have been reported in Wyoming samples in the GISAID database as of August 2020. The green triangles indicate that the base call from this scan suggests a reference call, whereas a red triangle indicates that the call suggests a nonreference base at this position. If the triangle points up, this is from the sense probe set, whereas a downward pointed triangle indicates the data is for the antisense probe set. The triangle outline is filled in if this probe set from this scan resulted in the highest likelihood for the correct call among all scans for this position. This image is the 4s scan of the WY64 sample. To call all the bases, we construct a similar likelihood function for each scan, and this information was combined as described in the methods to make the final base call.
Figure 3.
Figure 3.. Sequencing accuracy across the SARS-CoV-2 genome.
A. The Phred score (left axis) for all bases in the SARS-CoV-2 genome from the tiling array full genome sequencing of WY64. The positions of all variant calls are highlighted by Black “X,” and a Red “X” indicates this is a correct variant call (confirmed by the Illumina short read sequencing data). The cumulative sum of non-calls (Blue line), variant calls with a Phred score greater than 20 (Cyan line), and variant calls that have a Phred score greater than 20 and pass the low coverage filter (Red line) is shown on the secondary Y-axis. B. Comparison of the tiling array genome sequencing quality scores and variant calls to the amplicon coverage from short read Illumina sequencing data. The light blue (right axis) lines indicate the sequence coverage from the WY64 sample, and the dark blue lines indicate the average sequencing coverage over all Wyoming GISAID samples as of 8/2020.
Figure 4.
Figure 4.
A. The average number of variant base calls across all eight samples as a function of the genome coordinates. The cumulative sum of the number of variants identified is displayed on the secondary axis. The ~300 base region between 19,300 and 19,600 is where the largest number of putative variants are called. B. Same as panel A, except the x-axis spans the region between base positions 19,100 and 19,700. Bars indicate variant calls by sample and location. The cumulative sum reflects the number of variant calls across all samples. C. The maximum signal intensity from all exposure scans from the chip as a function of genome coordinate (same range as B.). From this panel, it can be seen that the maximum signal intensity (for all samples) on the genome tiling array is low in the region from base 19,300 to 19,600. This region corresponds to the low coverage region from the short read Illumina sequencing data (see Figure 3B). D. The Phred score of the final base calls from the tiling array as a function of the genome coordinate.

References

    1. WHO, coronavirus disease (COVID-19) dashboard. Geneva: World Health Organization. Available online: https://covid19.who.int 2020.
    1. He X; Lau EHY; Wu P; Deng X; Wang J; Hao X; Lau YC; Wong JY; Guan Y; Tan X; Mo X; Chen Y; Liao B; Chen W; Hu F; Zhang Q; Zhong M; Wu Y; Zhao L; Zhang F; Cowling BJ; Li F; Leung GM, Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med 2020, 26 (5), 672–675. - PubMed
    1. Gardy JL; Loman NJ, Towards a genomics-informed, real-time, global pathogen surveillance system. Nat Rev Genet 2018, 19 (1), 9–20. - PMC - PubMed
    1. Drmanac R; Drmanac S; Baier J; Chui G; Coleman D; Diaz R; Gietzen D; Hou A; Jin H; Ukrainczyk T; Xu C, DNA Sequencing by Hybridization with Arrays of Samples or Probes. In DNA Arrays Methods and Protocols, Humana Press: Clifton, N.J., 2001; Vol. 170. - PubMed
    1. Shendure J; Mitra RD; Varma C; Church GM, Advanced sequencing technologies: methods and goals. Nat Rev Genet 2004, 5 (5), 335–44. - PubMed

Publication types