Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 17;11(8):949.
doi: 10.3390/genes11080949.

Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation during a Pandemic

Affiliations

Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation during a Pandemic

Sureshnee Pillay et al. Genes (Basel). .

Abstract

The COVID-19 pandemic has spread very fast around the world. A few days after the first detected case in South Africa, an infection started in a large hospital outbreak in Durban, KwaZulu-Natal (KZN). Phylogenetic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. This manuscript outlines the obstacles encountered in order to genotype SARS-CoV-2 in near-real time during an urgent outbreak investigation. This included problems with the length of the original genotyping protocol, unavailability of reagents, and sample degradation and storage. Despite this, three different library preparation methods for Illumina sequencing were set up, and the hands-on library preparation time was decreased from twelve to three hours, which enabled the outbreak investigation to be completed in just a few weeks. Furthermore, the new protocols increased the success rate of sequencing whole viral genomes. A simple bioinformatics workflow for the assembly of high-quality genomes in near-real time was also fine-tuned. In order to allow other laboratories to learn from our experience, all of the library preparation and bioinformatics protocols are publicly available at protocols.io and distributed to other laboratories of the Network for Genomics Surveillance in South Africa (NGS-SA) consortium.

Keywords: COVID-19; Illumina; SARS-CoV2; bioinformatics; protocols; sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
Processes to generate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes and qPCR diagnostics in the KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP) laboratory. The figure also shows the number of days needed by two senior scientists to generate 24 whole genomes by using an Illumina Miseq Nano kit V2. It is possible to generate 94 whole genomes with one extra day of sequencing with the use of a MiSeq Reagent Kit v2 (500 cycles).
Figure 2
Figure 2
Three-step workflow for generation of high-quality genomes. Step 1: Raw reads from Illumina and Nanopore sequencing were assembled by using the web-based Genome Detective 1.126 (https://www.genomedetective.com/) platform and its coronavirus typing tool. Step 2: The initial assembly obtained from Genome Detective was polished by aligning mapped reads to the references and filtering out mutations with low genotype likelihoods, using bcftools 1.7-2 mpileup method. This calculation determines the probability of a genotype at sites containing reads with various bases (e.g., the probability that position 27,784 is A vs. T in illustration above). Step 3: All mutations were validated visually with BAM files viewed in Geneious software, to ensure that called mutations were true and not part of lingering adapter sites.
Figure 3
Figure 3
Association between cycle threshold (Ct) value and genome length. (A) Regression plot of mean Ct value of all unique samples against their genome lengths (% coverage against SARS-CoV-2 reference). Samples with missing Ct value information (n = 8) are shown in red. Forty-four assembled genomes of >90% were produced from samples having Ct value <27 (blue); six genomes of >90% and Ct value >27 (green); 12 genomes <90% coverage and Ct value <27 (purple); and 37 genomes <90% coverage and Ct value >27 (orange). (B) Box plot and statistical comparison of genome coverage obtained from samples grouped in three mean Ct value thresholds (25, 27, and 30), showing statistically significant (t-tests) differences between lower and higher Ct value samples. ****: level of significance.
Figure 4
Figure 4
Association between Ct value and genome length by library preparation method. (A) Regression plot of mean Ct value of all unique samples against their genome lengths (% coverage against SARS-CoV-2 reference). Samples with missing Ct value information (n = 8) are shown in red. A total of 114 assembled genomes of >90% were produced (80 with Ct value <27, 29 with Ct value >27, and five with missing Ct values). (B) Box plot and statistical comparison of genome coverage obtained from samples grouped in three mean Ct value thresholds (25, 27, and 30) by library preparation method, showing statistically significant (t-tests) differences between lower and higher Ct value samples. ****: level of significance.
Figure 5
Figure 5
Phylogenetic tree. Showing a Maximum-Likelihood (ML) tree of the 54 genomes (orange circles) against publicly available SARS-CoV-2 genomes as reference. The 54 genomes fall mostly in the B.1 (n = 50), B (n = 3), or B.2 (n = 1) lineages.

References

    1. Wu F., Zhao S., Yu B., Chen Y.M., Wang W., Song Z.G., Hu Y., Tao Z.W., Tian J.H., Pei Y.Y., et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
    1. Ludwig S., Zarbock A. Coronaviruses and SARS-CoV-2: A Brief Overview. Anesth. Analg. 2020;131:93–96. doi: 10.1213/ANE.0000000000004845. - DOI - PMC - PubMed
    1. Maurier F., Beury D., Fléchon L., Varré J.-S., Touzet H., Goffard A., Hot D., Caboche S. A complete protocol for whole-genome sequencing of virus from clinical samples: Application to coronavirus OC43. Virology. 2019;531:141–148. doi: 10.1016/j.virol.2019.03.006. - DOI - PMC - PubMed
    1. Gilchrist C.A., Turner S.D., Riley M.F., Petri W.A.J., Hewlett E.L. Whole-genome sequencing in outbreak analysis. Clin. Microbiol. Rev. 2015;28:541–563. doi: 10.1128/CMR.00075-13. - DOI - PMC - PubMed
    1. Grubaugh N.D., Ladner J.T., Lemey P., Pybus O.G., Rambaut A., Holmes E.C., Andersen K.G. Tracking virus outbreaks in the twenty-first century. Nat. Microbiol. 2019;4:10–19. doi: 10.1038/s41564-018-0296-2. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources