. 2025 Jan 6:14:giaf063.

doi: 10.1093/gigascience/giaf063.

PVGA: a precise viral genome assembler using an iterative alignment graph

Zhi Song¹, Dehan Cai², Yanni Sun², Lusheng Wang^{1

3}

Affiliations

¹ Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
² Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
³ City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong Province, China.

PMID: 40552980
PMCID: PMC12206156
DOI: 10.1093/gigascience/giaf063

PVGA: a precise viral genome assembler using an iterative alignment graph

Zhi Song et al. Gigascience. 2025.

. 2025 Jan 6:14:giaf063.

doi: 10.1093/gigascience/giaf063.

Authors

Zhi Song¹, Dehan Cai², Yanni Sun², Lusheng Wang^{1

3}

Affiliations

¹ Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
² Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
³ City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong Province, China.

PMID: 40552980
PMCID: PMC12206156
DOI: 10.1093/gigascience/giaf063

Abstract

Background: Viral genome analysis is crucial for understanding virus evolution and mutation. Investigations into viral evolutionary dynamics and mutation patterns have garnered significant research attention since the outbreak of COVID-19. The basic structure of many virus genomes is highly conserved [1]. RNA viruses have high mutation rates, and single-nucleotide variations may induce substantial phenotypic alterations in terms of viral function and pathogenicity. Thus, special assembly methods are required for viral genome analysis.

Result: PVGA starts with a reference genome and the sequencing reads. The first step in PVGA involves constructing an alignment graph based on a reference genome and the set of input sequencing reads. Then the optimal genomic path is determined through dynamic programming, maximizing the cumulative edge weights that reflect read support density across the alignment graph. The obtained path corresponds to a refined genome. Finally, we repeat the process by using the new reference genomes until no further improvement is possible. We evaluate PVGA's performance across both assembly and polishing tasks using simulated and real datasets, including both long reads and short reads. The experiments demonstrate that PVGA always outperforms popular existing programs in terms of the quality of assembly results, while the running time of our method is compatible to others. In particular, simulated Nanopore datasets show that our method can correctly report the true genomes with 0 mismatches and 0 indels.

Conclusions: PVGA is a novel viral genome assembler that seamlessly integrates assembly and polishing into a unified workflow. Its design prioritizes high accuracy, enabling the detection of subtle genomic variations that can impact viral function and pathogenicity. By addressing the unique challenges of viral genome assembly, PVGA provides a reliable and precise solution for advancing our understanding of viral evolution and behavior.

Keywords: alignment graph; genome assembler; iterative method; maximum total weight path; virus genome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1:**
Flowchart of construction of alignment graph and iteration process. (A) PVGA takes a reference genome as backbone graph . (B) PVGA aligns the first read Read1 to the backbone. (C) Four reads are aligned with the backbone, awaiting the subsequent merging process. (D) PVGA merges edges that point to the same node; the new edge’s weight is equal to the sum of the weights of the merged edges. This process can be performed either after aligning all reads or during the alignment process, with a final merge conducted after all reads have been aligned. (E) Iteratively construct the alignment graph using the result from the previous iteration as the backbone.

formula image — **Figure 1:**
Flowchart of construction of alignment graph and iteration process. (A) PVGA takes a reference genome as backbone graph . (B) PVGA aligns the first read Read1 to the backbone. (C) Four reads are aligned with the backbone, awaiting the subsequent merging process. (D) PVGA merges edges that point to the same node; the new edge’s weight is equal to the sum of the weights of the merged edges. This process can be performed either after aligning all reads or during the alignment process, with a final merge conducted after all reads have been aligned. (E) Iteratively construct the alignment graph using the result from the previous iteration as the backbone.

**Figure 2:**
Results on simulated Nanopore HIV-1 datasets with an average read length of 2 kb and 4 kb, respectively. The 4 subfigures in each row represent mismatch, indels, indel length, and edit distance from left to right, respectively.

**Figure 3:**
Results on simulated Nanopore SARS-CoV-2 datasets with an average read length of 2 kb and 4 kb, respectively. The 4 subfigures in each row represent mismatch, indels, indel length, and edit distance from left to right, respectively.

**Figure 4:**
Pairwise similarity matrix of 5 HIV-1 strains.

**Figure 5:**
Comparison of CPU times for the 5 tools on the 3 datasets of 50×, 100×, and 200× coverage, respectively. (A) HIV-1 virus 89.6 strain. (B) Measles virus. (C) SARS-CoV-2.

**Figure 6:**
Comparison of maximum memory consumption during the runtime across 3 datasets with 50, 100, and 200 coverage: (A) HIV-1 89.6 strain, (B) measles virus, and (C) SARS-CoV-2.

See this image and copyright information in PMC

References

1. Hofacker IL, Stadler PF, Stocsits RR. Conserved RNA secondary structures in viral genomes: a survey. Bioinformatics. 2004;20(10):1495–99. 10.1093/bioinformatics/bth108. - DOI - PubMed
1. Harvey WT, Carabelli AM, Jackson B, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19(7):409–24. 10.1038/s41579-021-00573-0. - DOI - PMC - PubMed
1. Jain M, Koren S, Miga KH, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018;36(4):338–45. 10.1038/nbt.4060. - DOI - PMC - PubMed
1. Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–38. 10.1126/science.1162986. - DOI - PubMed
1. Wenger AM, Peluso P, Rowell WJ, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155–62. 10.1038/s41587-019-0217-9. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PVGA: a precise viral genome assembler using an iterative alignment graph

Affiliations

PVGA: a precise viral genome assembler using an iterative alignment graph

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources