Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 19;96(46):18537-18544.
doi: 10.1021/acs.analchem.4c04492. Epub 2024 Nov 4.

Enhancing SARS-CoV-2 Lineage Surveillance through the Integration of a Simple and Direct qPCR-Based Protocol Adaptation with Established Machine Learning Algorithms

Affiliations

Enhancing SARS-CoV-2 Lineage Surveillance through the Integration of a Simple and Direct qPCR-Based Protocol Adaptation with Established Machine Learning Algorithms

Cleber Furtado Aksenen et al. Anal Chem. .

Abstract

Emerging and evolving Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) lineages, adapted to changing epidemiological conditions, present unprecedented challenges to global public health systems. Here, we introduce an adapted analytical approach that complements genomic sequencing, applying a cost-effective quantitative polymerase chain reaction (qPCR)-based assay. Viral RNA samples from SARS-CoV-2 positive cases detected by diagnostic laboratories or public health network units in Ceará, Brazil, were tracked for genomic surveillance and analyzed by using paired-end sequencing combined with integrative genomic analysis. Validation of a key structural variation was conducted with gel electrophoresis for the presence of a specific open reading frame 7a(ORF7a) gene deletion within the "BE.9" lineages tracked. The analytical innovation of our method is the optimization of a simple intercalating dye-based qPCR assay through repositioning primers from the ARTIC v4.1 amplicon panel to detect large molecular patterns. This assay distinguishes between "BE.9" and "non-BE.9" lineages, particularly BQ.1, without the need for expensive probes or sequencing. The protocol was validated against lineage predictions from next-generation sequencing (NGS) using 525 paired samples, achieving 93.3% sensitivity, 95.1% specificity, and 92.4% agreement, as measured by Cohen's Kappa coefficient. Machine learning (ML) models were trained using the melting curves from intercalating dye-based qPCR of 1724 samples, enabling highly accurate lineage assignment. Among them, the support vector machine (SVM) model had the best performance and after fine-tuning showed ∼96.52% (333/345) accuracy in comparison to the test data set. Our integrated approach provides an adapted analytical method that is both cost-effective and scalable, suitable for rapid assessment of emerging variants, especially in resource-limited settings. In this work, the protocol is applied to improve the monitoring of SARS-CoV-2 sublineages but can be extended to track any key molecular signature, including large insertions and deletions (indels) commonly observed in pathogenic agent subtypes. By offering a complement to traditional sequencing methods and utilizing easily trainable machine learning algorithms, our methodology contributes to enhanced molecular surveillance strategies and supports global efforts in pandemic control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Genomic coverage profiles highlighting the ORF7a deletion in SARS-CoV-2 BE.9 lineages. (A, B) Sequencing coverage across the SARS-CoV-2 genome for BE.9 and non-BE.9 lineages, respectively. In BE.9 samples (A), a pronounced drop in coverage is observed in the ORF7a gene region (nucleotides 27,508–27,751), corresponding to the characteristic 244 bp deletion. Non-BE.9 samples (B) show consistent coverage across this region, indicating the absence of deletion.
Figure 2
Figure 2
Agarose gel electrophoresis validating the ORF7a deletion in BE-9 samples. Electrophoresis results of amplified DNA fragments from SARS-CoV-2 samples. Lanes S01 to S08 represent BE.9 lineage samples, showing smaller DNA fragments around 170–200 bp due to the 244 bp deletion in the ORF7a gene. Lanes S09 to S16 represent non-BE.9 samples, displaying larger DNA fragments between 400 and 430 bp, corresponding to the intact ORF7a gene. The last lane shows the negative control (NTC). This size difference confirms the presence of the deletion in BE.9 samples and its absence in non-BE.9 samples.
Figure 3
Figure 3
Melting curve analysis for detection of the ORF7a 244 bp deletion via intercalating dye-based qPCR. The figure shows the first derivative melting curves from the qPCR assay targeting the 244 bp deletion of the ORF7a gene. BE.9 samples (red curves) consistently exhibit lower Tm of 76.78  ±  0.18 °C due to the shorter amplicon size resulting from the deletion. Non-BE.9 samples (blue curves) display higher Tm values of 80.76  ±  0.24 °C, corresponding to the longer intact amplicon. The negative control (purple curve) shows no amplification, confirming the assay’s specificity.
Figure 4
Figure 4
Confusion matrix illustrating the performance of the optimized SVM model on melting curve classification. The figure displays the confusion matrix for the optimized SVM model applied to the qPCR melting curve data. The model demonstrates high classification accuracy for “BE.9”, “non-BE.9”, and “Inconclusive” samples, indicating substantial reliability in automating the classification process using machine learning.

References

    1. Abulsoud A. I.; El-Husseiny H. M.; El-Husseiny A. A.; El-Mahdy H. A.; Ismail A.; Elkhawaga S. Y.; Khidr E. G.; Fathi D.; Mady E. A.; Najda A.; Algahtani M.; Theyab A.; Alsharif K. F.; Albrakati A.; Bayram R.; Abdel-Daim M. M.; Doghish A. S. Mutations in SARS-CoV-2: Insights on Structure, Variants, Vaccines, and Biomedical Interventions. Biomed. Pharmacother. 2023, 157, 11397710.1016/j.biopha.2022.113977. - DOI - PMC - PubMed
    1. Tomaszewski T.; DeVries R. S.; Dong M.; Bhatia G.; Norsworthy M. D.; Zheng X.; Caetano-Anollés G. New Pathways of Mutational Change in SARS-CoV-2 Proteomes Involve Regions of Intrinsic Disorder Important for Virus Replication and Release. Evol. Bioinf. 2020, 16, 117693432096514910.1177/1176934320965149. - DOI - PMC - PubMed
    1. Jeronimo P. M. C.; Aksenen C. F.; Duarte I. O.; Lins R. D.; Miyajima F. Evolutionary Deletions within the SARS-CoV-2 Genome as Signature Trends for Virus Fitness and Adaptation. J. Virol. 2024, 98, e01404-2310.1128/jvi.01404-23. - DOI - PMC - PubMed
    1. Harvey W. T.; Carabelli A. M.; Jackson B.; Gupta R. K.; Thomson E. C.; Harrison E. M.; Ludden C.; Reeve R.; Rambaut A.; Peacock S. J.; Robertson D. L. SARS-CoV-2 Variants, Spike Mutations and Immune Escape. Nat. Rev. Microbiol. 2021, 19, 409–424. 10.1038/s41579-021-00573-0. - DOI - PMC - PubMed
    1. Carabelli A. M.; Peacock T. P.; Thorne L. G.; Harvey W. T.; Hughes J.; de Silva T. I.; Peacock S. J.; Barclay W. S.; de Silva T. I.; Towers G. J.; Robertson D. L. SARS-CoV-2 Variant Biology: Immune Escape, Transmission and Fitness. Nat. Rev. Microbiol. 2023, 21, 162–177. 10.1038/s41579-022-00841-7. - DOI - PMC - PubMed