Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 7;15(1):34892.
doi: 10.1038/s41598-025-18664-w.

Scalable long-read nanopore HPV16 amplicon-based whole-genome sequencing

Affiliations

Scalable long-read nanopore HPV16 amplicon-based whole-genome sequencing

Maina K Titus et al. Sci Rep. .

Abstract

Human papillomavirus 16 (HPV16) drives precursor cervical lesions that often progress to cervical cancer (CC). Variation within the HPV16 genome has been associated with CC risk. Here, we developed an affordable and portable amplicon-based long-read whole genome sequencing (WGS) approach using Oxford Nanopore Technologies to investigate HPV16 genetic diversity among women in sub-Saharan African countries. Applied to a control CaSki cell line and clinical samples (n = 12), our method generated complete HPV16 genomes at high coverage (median read coverage: 5,899-15,279 ×). Benchmarking our HPV16 controls showed high accuracy for two variant calling pipelines (Clair3 and PEPPER-Margin DeepVariant). Phylogenetic analysis identified all four previously defined HPV16 lineages (A-D) and their high-risk sublineages. All lineages exhibited strong concordance across de novo assembly, reference-based phylogenetics, and unsupervised clustering. Our pipeline effectively captured the full extent of genomic variation, including putative lineage-informative SNPs. This method offers a robust amplicon-based WGS and analysis pipeline for HPV16, making it well-suited for integration into surveillance, diagnostics, and epidemiological efforts in low-resource areas.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical statement: All participants provided written informed consent. Ethical approvals were obtained from the Institutional Research and Ethics Committee (IREC) of Moi Teaching and Referral Hospital (MTRH) and Moi University, Eldoret, Kenya (reference number IREC/371/2022) and from the Institutional Review Board (IRB) of Brown University, Providence, Rhode Island, USA (IRB-ID: 2,023,003,553). All study procedures were performed in accordance with relevant guidelines and regulations outlined by the Ethics Review Boards indicated above.

Figures

Fig. 1
Fig. 1
A trio primer set for HPV16 whole-genome sequencing enrichment. a Schematic of the primer design for viral pre-sequencing enrichment. On top of the panel, the tiling-path primers are set: T1 (dark red), T2 (light green), and T3 (blue), along with the U1 (yellow) primer set, which is essential for amplifying the ligated region at the genome’s terminus. At the bottom of the panel is the near full-length primer set (gray). b Representation of HPV16 in its integrated extrachromosomal form as a multi-tandem repeat (upper) and episomal form (lower). c HPV16 long-read analysis pipeline. ONT reads were base-called with Guppy, aligned using Minimap2, and processed with SAMtools. Variants were called with Clair3. De Novo Assembly was executed using Canu, followed by sequence alignment and phylogenetic analysis in MEGA. d Representative depth-of-coverage plot showing sequencing read depth across the CaSki HPV16 genome relative to the reference for combined amplicons.
Fig. 2
Fig. 2
Performance comparison of Clair3 and PEPPER against GATK. a A Venn diagram shows the overlap among the variant callers: PEPPER (brown), Clair3 (green), and the Truth-set GATK. b Overall concordance performance of Clair3 (blue) and PEPPER (green) relative to the Truth-set GATK, with F1-Score, Precision, and Recall on the x-axis and percentages on the y-axis. c SNP concordance metrics for Clair3 (blue) and PEPPER (brown), displaying F1-Score, Precision, and Recall on the x-axis and percentages on the y-axis. d INDEL concordance metrics for Clair3 (purple) and PEPPER (yellow), featuring F1-Score, Precision, and Recall on the x-axis and percentages on the y-axis.
Fig. 3
Fig. 3
Depth of coverage analysis of HPV16 genomes in clinical cervical isolates. Read depth and coverage profiles for 12 HPV16-positive clinical samples, categorized by lineage, were generated from bedgraph files. a Lineage A (blue), b Lineage B (vermilion), c Lineage C (green), and d Lineage D (magenta), with each panel representing three clinical samples per lineage. The x-axis indicates the genomic position (kb) of HPV16, and the y-axis shows the sequencing depth. PMTRHP3 showed > 1,000 × coverage, while PMTRHP10 exhibited a drop near 3 kb, consistent with a 540 bp deletion (positions 2851–3390). High-intensity peaks reflect overlapping amplicons, indicating strong and consistent coverage across samples.
Fig. 4
Fig. 4
Phylogenetic tree and UMAP clustering of HPV16 genomes. a A maximum likelihood phylogenetic tree constructed from reference HPV16 genomes (GenBank; marked with an asterisk) and assembled clinical isolates from cervical swabs (black triangles). Branch colors indicate the four major HPV16 lineages. The scale bar represents genetic distance. Node support is visualized with gradient circles, where brighter shades indicate higher bootstrap values. b A phylogenetic tree generated from variant call format (VCF) files, with tip colors corresponding to HPV16 lineages as in panel a. c A UMAP plot based on processed VCF data, illustrating unsupervised clustering of HPV16 genomes. In all panels, HPV16 lineages are color-coded as follows: A (blue), B (red), C (green), and D (purple).
Fig. 5
Fig. 5
Analysis of fixed mutations in HPV16 clinical isolates. a Distribution of variant types: single-nucleotide polymorphisms (SNPs; blue), insertions (vermilion), and deletions (magenta). b Ratio of missense (green) to silent (magenta) mutations. c Normalized variant frequency by gene across HPV16 lineages, color-coded as A (blue), B (vermilion), C (green), and D (magenta). d Mutation counts across HPV16 genes: synonymous (green), missense (blue), and other protein-altering mutations (magenta), including disruptive in-frame, frameshift, and stop-gained variants.

Update of

References

    1. Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians vol. 71 209–249 Preprint at 10.3322/caac.21660 (2021). - PubMed
    1. Sengayi-Muchengeti, M. et al. Cervical cancer survival in sub-Saharan Africa by age, stage at diagnosis and Human Development Index: A population-based registry study. Int. J. Cancer147, 3037–3048 (2020). - PubMed
    1. World Health Organization. Global Strategy to Accelerate the Elimination of Cervical Cancer as a Public Health Problem.
    1. Simms, K. T. et al. Benefits, harms and cost-effectiveness of cervical screening, triage and treatment strategies for women in the general population. Nat. Med.29, 3050–3058 (2023). - PMC - PubMed
    1. Recommendations and Good Practice Statements on Screening and Treatment to Prevent Cervical Cancer. (World Health Organization, 2021).

LinkOut - more resources