Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny
- PMID: 41663577
- DOI: 10.1038/s41592-025-02947-1
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny
Abstract
The majority of SARS-CoV-2 genomes obtained during the pandemic were derived by amplifying overlapping windows of the genome ('tiled amplicons'), reconstructing their sequences and fitting them together. This leads to systematic errors in genomes unless the software is both aware of the amplicon scheme and of the error modes of amplicon sequencing. Additionally, over time, amplicon schemes need to be updated as new mutations in the virus interfere with the primer binding sites at the end of amplicons. Thus, waves of variants swept the world during the pandemic and were followed by waves of systematic errors in the genomes, which had significant impacts on the inferred phylogenetic tree.Here we reconstruct the genomes from all public data as of June 2024 using an assembly tool called Viridian ( https://github.com/iqbal-lab-org/viridian ), developed to rigorously process amplicon sequence data. With these high-quality consensus sequences we provide a global phylogenetic tree of 4,471,579 samples, viewable at https://viridian.taxonium.org . We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: G. Screaton is on the GSK Vaccines Scientific Advisory Board, consults for AstraZeneca, and is a founding member of RQ Biotechnology. P. Fowler, D. Crook and Z. Iqbal have consulted for the Ellison Institute of Technology. B. Jolly is employed by Karkinos Healthcare Private Limited. The remaining authors declare no competing interests.
Update of
-
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny.bioRxiv [Preprint]. 2024 Nov 5:2024.04.29.591666. doi: 10.1101/2024.04.29.591666. bioRxiv. 2024. Update in: Nat Methods. 2026 Feb 9. doi: 10.1038/s41592-025-02947-1. PMID: 38746185 Free PMC article. Updated. Preprint.
References
-
- De Maio, N. et al. Issues with sars-cov-2 sequencing data. Virlogical.org https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 (2020).
-
- Holmes, E. Novel 2019 coronavirus genome. Virological.org https://virological.org/t/novel-2019-coronavirus-genome/319/1 (2020).
Grants and funding
- NIHR200915/DH | National Institute for Health Research (NIHR)
- NIHR200915/DH | National Institute for Health Research (NIHR)
- BAA 200-2021-11554./U.S. Department of Health & Human Services | Centers for Disease Control and Prevention (CDC)
- T32HG012344/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- 210918/Z/18/Z/Wellcome Trust (Wellcome)
- 222574,/Wellcome Trust (Wellcome)
- 5P20GM103443-20/U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R35GM128932/U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- 336490/Academy of Finland (Suomen Akatemia)
LinkOut - more resources
Full Text Sources
Miscellaneous
