Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 4;21(15):5585.
doi: 10.3390/ijms21155585.

Use of Whole Genome Sequencing Data for a First in Silico Specificity Evaluation of the RT-qPCR Assays Used for SARS-CoV-2 Detection

Affiliations

Use of Whole Genome Sequencing Data for a First in Silico Specificity Evaluation of the RT-qPCR Assays Used for SARS-CoV-2 Detection

Mathieu Gand et al. Int J Mol Sci. .

Abstract

The current COronaVIrus Disease 2019 (COVID-19) pandemic started in December 2019. COVID-19 cases are confirmed by the detection of SARS-CoV-2 RNA in biological samples by RT-qPCR. However, limited numbers of SARS-CoV-2 genomes were available when the first RT-qPCR methods were developed in January 2020 for initial in silico specificity evaluation and to verify whether the targeted loci are highly conserved. Now that more whole genome data have become available, we used the bioinformatics tool SCREENED and a total of 4755 publicly available SARS-CoV-2 genomes, downloaded at two different time points, to evaluate the specificity of 12 RT-qPCR tests (consisting of a total of 30 primers and probe sets) used for SARS-CoV-2 detection and the impact of the virus' genetic evolution on four of them. The exclusivity of these methods was also assessed using the human reference genome and 2624 closely related other respiratory viral genomes. The specificity of the assays was generally good and stable over time. An exception is the first method developed by the China Center for Disease Control and prevention (CDC), which exhibits three primer mismatches present in 358 SARS-CoV-2 genomes sequenced mainly in Europe from February 2020 onwards. The best results were obtained for the assay of Chan et al. (2020) targeting the gene coding for the spiking protein (S). This demonstrates that our user-friendly strategy can be used for a first in silico specificity evaluation of future RT-qPCR tests, as well as verifying that the former methods are still capable of detecting circulating SARS-CoV-2 variants.

Keywords: COVID-19; RT-qPCR; SARS-CoV-2; WGS data; bioinformatics tool; detection; diagnosis; in silico specificity evaluation; mismatches; primers and probes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
Location of the sequence amplified by each evaluated primer set in Table 2. Genome (NC_045512.2). The SARS-CoV-2 genome (~29,000 nt) is composed of genes coding for structural proteins, such as the Spike protein (S), Envelope protein (E), and Nucleocapsid protein (N); and non-structural proteins located in the Open Reading Frame 1ab (ORF1ab), such as RNA-dependent RNA polymerase (RdRp), Helicase (H), and non-structural protein 14 (nsp14). The orange rectangles in the figure show the approximate size and location in these genes of the target sequence that is amplified by each of the evaluated primer sets. The corresponding assay reference number is indicated in black, and its targeted gene in green (Table 1). Two labels connected to the same orange rectangle indicate that the targeted amplified sequences are overlapping. The exact starting point of each of the forward primers and the length of their corresponding amplicons are available in Table 1.
Figure 2
Figure 2
Sampling time and location for genomes that showed three mismatches in the sequence of Assay_1_N′s forward primer. Three mismatches between the forward primer sequence of Assay_1_N targeting N gene and 358 SARS-CoV-2 genomes were retrieved by SCREENED. Part (A) of the figure shows the occurrence of these genomes over time since the 25th of February 2020, and the arrows represent their first apparition in each continent according to the color legend in Part B. Ten of the 358 genomes with the described mismatches were not included in this figure, as their time of collection was not available. Part (B) of the figure shows the location where these genomes were collected. One of the 358 genomes with the described mismatches was not included in this figure, as its location was not communicated.
Figure 3
Figure 3
Diversity in the SARS-CoV-2 genomes of the target sequences amplified by the evaluated primer and probe sets. Clustering of the targeted genomic sequences amplified by the 30 evaluated primers was performed by SCREENED for each of the 12 RT-qPCR assays. The present chart shows the repartition of the amplicons from each genome in their sequence identity clusters (i.e., a set of targeted amplicons exhibiting exactly the same sequence), illustrating the overall sequence diversity according to the color key on the right of the figure for all primer and probe sets. For the majority of the assays, more than 97% of the amplicons were clustered in one large cluster. For Assay_1_N, Assay_11_N-1, and Assay_11_N-2, a second large cluster containing ~14% of the amplicons emerged. A varying amount of other clusters is present for the different methods, containing however only a very limited number of amplicons. Note that the y-axis, presenting the percentage of amplicons per cluster, starts at 75% to allow better the visual interpretation of amplicon diversity, since the first 75% always belongs to the first large cluster per primer and probe set.
Figure 4
Figure 4
Comparison of amplicon diversity in the SARS-CoV-2 genomes collected before and after the 7th of April 2020 for Assay_1_N and Assay_8_S. The chart shows for Assay_1_N and Assay_8_S the repartition of the sequence amplified in the genomes downloaded before (2569) and after (968) the 7th of April, in their clusters. After one month, the diversity in the region targeted by Assay_1_N increased, while the region targeted by Assay_8_S stayed highly conserved.

References

    1. Singhal T. A Review of Coronavirus Disease-2019 (COVID-19) Indian J. Pediatr. 2020;87:281–286. doi: 10.1007/s12098-020-03263-6. - DOI - PMC - PubMed
    1. Menni C., Valdes A.M., Freidin M.B., Sudre C.H., Nguyen L.H., Drew D.A., Ganesh S., Varsavsky T., Cardoso M.J., El-Sayed Moustafa J.S., et al. Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat. Med. 2020;26:1037–1040. doi: 10.1038/s41591-020-0916-2. - DOI - PMC - PubMed
    1. Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. - DOI - PMC - PubMed
    1. Zhou P., Yang X.L., Wang X.G., Hu B., Zhang L., Zhang W., Si H.R., Zhu Y., Li B., Huang C.L., et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273. doi: 10.1038/s41586-020-2012-7. - DOI - PMC - PubMed
    1. Lau S.K.P., Luk H.K.H., Wong A.C.P., Li K.S.M., Zhu L., He Z., Fung J., Chan T.T.Y., Fung K.S.C., Woo P.C.Y. Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg. Infect. Dis. J. 2020;26:1542. doi: 10.3201/eid2607.200092. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources