Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 1;11(1):13669.
doi: 10.1038/s41598-021-93145-4.

High-precision and cost-efficient sequencing for real-time COVID-19 surveillance

Affiliations

High-precision and cost-efficient sequencing for real-time COVID-19 surveillance

Sung Yong Park et al. Sci Rep. .

Abstract

COVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients' Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
CorvGenSurv’s workflow and precision. (a) Remnant NP/OP specimens from COVID-19 diagnostic testing were subject to SARS-CoV-2 RNA extraction. Viral RNA was amplified via three overlapping RT-PCRs (~ 10,000 base long each) and pooled SARS-CoV-2 amplicons of indexed COVID-19 specimens were then sequenced by long-read high-throughput single-molecule sequencing. The output fasta file was de-multiplexed and processed to produce the consensus sequence of each segment. Each COVID-19 specimen’s three overlapping segments were assembled into a SARS-CoV-2 whole genome sequence. (b) CorvGenSurv’s precision was tested by comparing the consensus sequence from a given number of reads with the USA-WA1/2020 control strain (GenBank: MN985325.1). When a consensus sequence was built from three reads, only 67% [52.4–78.9%] of the 1000 bootstrap runs’ resulting consensus sequences were consistent with the correct sequence. When the number of the reads was greater or equal to 31, all 1000 bootstrap runs resulted in the correct sequence.
Figure 2
Figure 2
Maximum likelihood tree analysis and amino acid mutations of 25 SARS-CoV-2 whole genome sequences obtained by CorvGenSurv. (a) Maximum likelihood tree of 25 SARS-CoV-2 sequences obtained by CorvGenSurv along with sequences collected in California, US. A total of 1215 SARS-CoV-2 sequences collected from California, USA were downloaded from GISAID, as of July 27th, 2020. Our sequences were obtained from 25 remnant specimens from COVID-19 testing between April 13th and June 22nd, 2020 from Los Angeles County, California, USA. Specimens collected from April to May 2020 were colored purple and those collected in June were colored blue. Sequences of specimens USA/CA-LAC-USC1 to USA-CA-LAC-USC25 in Table 1 were denoted by 1 to 25 in this tree. All 25 sequences were classified as G clade with mutations P323L in NSP12 (RdRP) and D614G in S protein (grey circle). Different ancestral sequences were presented by circles in different colors with common mutations of each lineage presented in the box. The unit branch length (one nucleotide base substitution) was denoted as “HD = 1”. (b) Each of our 25 sequences’ amino acid mutations from Wuhan-Hu-1 (MN908947) were marked using Highlighter (https://www.hiv.lanl.gov/content/sequence/HIGHLIGHT/highlighter_top.html). The regions of NSP2, NSP12 (RdRP), S, E, M, and N were presented by colored boxes. (c) The prevalence of each amino acid mutation with greater than 2% frequency either globally, in the USA, or in California. A total of 28,176 global sequences were downloaded from GISAID,.
Figure 3
Figure 3
SARS-CoV-2 divergence. Our 25 Los Angeles sequences’ number of base substitutions from the reference sequence Wuhan-Hu-1 (MN908947) was plotted against the collection time of each sequence as days from the reference sequence collection time, December 31st, 2019. The SARS-CoV-2 evolution rate was estimated to be 8.62 × 10–4 substitutions per site per year (95% confidence interval: 7.96 × 10–4 to 9.24 × 10–4) by linear regression (solid line).
Figure 4
Figure 4
Influenza A (H1N1) evolution and vaccination. (a) Maximum likelihood tree of 255 H1N1 Hemagglutinin (HA) sequences sampled in April 2019 (blue boxes), 1140 H1N1 HA sequences sampled in January 2020 (red boxes), 2019–2020 H1N1 Northern hemisphere vaccine strain (A/Brisbane/02/2018, purple diamond) and 2018–2019 vaccine strain (A/Michigan/45/201, grey diamond). All HA nucleotide sequences were downloaded from GISAID,. The H1N1 HA sequences in January 2020 showed greater tree distances from the 2019–2020 H1N1 vaccine strain, compared to those in April 2019 (b) Two-dimensional map of 255 sequences collected in April 2019 along with the 2019–2020 H1N1 vaccine strain’s HA sequence (purple diamond) and 2018–2020 HIN1 vaccine’s HA sequence (grey diamond). The nucleotide distance among all pairs of sequences was scaled to the Euclidean distance by multidimensional scaling. (c) Two-dimensional map of 1140 HA sequences collected in January 2020 along with the two vaccine sequences. (d) The HA sequences in January 2020 showed greater nucleotide distances from the 2019–2020 vaccine strain than those in April 2019 (p < 0.001, Wilcoxon rank sum test).

References

    1. Armstrong GL, et al. Pathogen genomics in public health. N. Engl. J. Med. 2019;381:2569–2580. doi: 10.1056/NEJMsr1813907. - DOI - PMC - PubMed
    1. Khoury MJ, et al. From public health genomics to precision public health: A 20-year journey. Genet. Med. 2018;20:574–582. doi: 10.1038/gim.2017.211. - DOI - PMC - PubMed
    1. Peters PJ, et al. HIV infection linked to injection use of oxymorphone in Indiana, 2014–2015. N. Engl. J. Med. 2016;375:229–239. doi: 10.1056/NEJMoa1515195. - DOI - PubMed
    1. Gardy JL, et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N. Engl. J. Med. 2011;364:730–739. doi: 10.1056/NEJMoa1003176. - DOI - PubMed
    1. Quick J, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature. 2016;530:228–232. doi: 10.1038/nature16996. - DOI - PMC - PubMed