Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 22;15(1):40.
doi: 10.3390/v15010040.

A Customized Monkeypox Virus Genomic Database (MPXV DB v1.0) for Rapid Sequence Analysis and Phylogenomic Discoveries in CLC Microbial Genomics

Affiliations

A Customized Monkeypox Virus Genomic Database (MPXV DB v1.0) for Rapid Sequence Analysis and Phylogenomic Discoveries in CLC Microbial Genomics

Jane Shen-Gunther et al. Viruses. .

Abstract

Monkeypox has been a neglected, zoonotic tropical disease for over 50 years. Since the 2022 global outbreak, hundreds of human clinical samples have been subjected to next-generation sequencing (NGS) worldwide with raw data deposited in public repositories. However, sequence analysis for in-depth investigation of viral evolution remains hindered by the lack of a curated, whole genome Monkeypox virus (MPXV) database (DB) and efficient bioinformatics pipelines. To address this, we developed a customized MPXV DB for integration with "ready-to-use" workflows in the CLC Microbial Genomics Module for whole genomic and metagenomic analysis. After database construction (218 MPXV genomes), whole genome alignment, pairwise comparison, and evolutionary analysis of all genomes were analyzed to autogenerate tabular outputs and visual displays (collective runtime: 16 min). The clinical utility of the MPXV DB was demonstrated by using a Chimpanzee fecal, hybrid-capture NGS dataset (publicly available) for metagenomic, phylogenomic, and viral/host integration analysis. The clinically relevant MPXV DB embedded in CLC workflows proved to be a rapid method of sequence analysis useful for phylogenomic exploration and a wide range of applications in translational science.

Keywords: bioinformatics; disease outbreaks; monkeypox; monkeypox virus; next generation sequencing; phylogeny; poxvirus; taxonomic classification; virus database.

PubMed Disclaimer

Conflict of interest statement

The Defense Health Agency (DHA) of the U.S. Department of Defense has licensed the customized MPXV v1.0 and MOCVA v1.0 databases described herein to QIAGEN Digital Insights. The inventor of the customized taxonomy is J.SG. No potential conflicts of interest were disclosed by the other authors. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. This paper has undergone PAO review at Brooke Army Medical Center and was cleared for publication. The view(s) expressed herein are those of the authors and do not reflect the official policy or position of Brooke Army Medical Center, the United States Army Medical Department, the United States Army Office of the Surgeon General, the Department of the Army, the Defense Health Agency, the Department of Defense or the United States Government.

Figures

Figure 1
Figure 1
Representative Monkeypox virus genome. Linear, double-stranded, reference genome (NC_003310), ZAI-96-I-16 (MPV-ZAI) isolate from a patient during the 1996 outbreak in Zaire. The 196,858-bp sequence encodes 190 open reading frames (ORFs) with a highly conserved central region and terminal variable regions sealed in hair-pin loops on the ends. The triplet-coded, alpha-numeric nomenclature of the genes represents the Hind III restriction endonuclease fragments (alphabets), the nth gene (number) counted from the left of a fragment, and the direction of transcription (R: rightward, L: leftward). Notably, three genes are deleted (D14L) or mutated (B10R and B14R) in the less virulent, prototypic Western African MPXV isolate (SL-V70) (red arrow heads) [8,9,10]. Electron micrograph of two monkeypox virions revealing dumbbell-shaped inner core [18].
Figure 2
Figure 2
Bioinformatics methods (A) CLC Microbial Genomics Module, databases and dataset used for Monkeypox Virus (MPXV) Whole Genome Alignment (WGA) and comparative genomics. Primary workflows and tools used for this study are designated by the blue virus icon; (B) WGA workflow steps (1–4) with user-defined parameter settings for WGA (bold) and annotation, i.e., MPXV REF genome; (C) Create Average Nucleotide Identity Comparison workflow inputs the MPXV WGA file to quantify the similarity between genomes. The output is an autogenerated pairwise comparison matrix.
Figure 3
Figure 3
Representative views of the MPXV Whole Genome Alignment (WGA) and Pairwise Comparison Table (A) MPXV WGA view shows the alignments between three MPXV genomic sequences. The view of the entire alignment (218 whole MPXV genomes) is shown in Supplementary Video S1. The synteny blocks (conserved genomic regions among strains) and connected line correspond to aligned regions of the genomes (orange rectangle); (B) Pairwise comparison table with quantitative measures of similarity between MPXV genomes. Lower comparison matrix tabulates the Alignment Percentage (AP) or average alignment percentage between two MPXV genomes (range, 88–100%). Upper comparison matrix tabulates the Average Nucleotide Identity (ANI) or the percentage of exact nucleotide matches of the aligned regions (range, 99–100%). The lightest shades of green or pink corresponds to the highest sequence similarity (%).
Figure 4
Figure 4
Circular phylograms of MPXV genomes by metadata for temporal, host, and spatial visualization (A) Phylogram of complete MPXV genomes (n = 218) by clades. Clades 1 and 2 originated from Central and West Africa, respectively, with divergent branches. In contrast, Clade 3 descends from Clade 2 and consists of publicly available MPXV genomes from the 2022 global outbreak. The sublineages (group) of the clades reveal the relatedness of its member samples. Three attributes of each genome (sequence accession number, 3-letter country code, and year) are displayed as the outermost ring (label) of the phylogram; (B) Phylograms of the MPXV genomes by metadata, i.e., year (color-coded by decades), host, continent, and country. The temporal, host, and spatial relationships reveal how the MPXV, first discovered in 1958, spilled over from mammals to humans, and spread quickly in 2022 out of Africa into Europe and North America. * The MPXV genomes of a Dormouse and Rope squirrel (USA, 2003) identical to that of a Gambian pouched rat are not shown. (The ANI NJ unrooted trees were constructed from the pairwise comparison table).
Figure 4
Figure 4
Circular phylograms of MPXV genomes by metadata for temporal, host, and spatial visualization (A) Phylogram of complete MPXV genomes (n = 218) by clades. Clades 1 and 2 originated from Central and West Africa, respectively, with divergent branches. In contrast, Clade 3 descends from Clade 2 and consists of publicly available MPXV genomes from the 2022 global outbreak. The sublineages (group) of the clades reveal the relatedness of its member samples. Three attributes of each genome (sequence accession number, 3-letter country code, and year) are displayed as the outermost ring (label) of the phylogram; (B) Phylograms of the MPXV genomes by metadata, i.e., year (color-coded by decades), host, continent, and country. The temporal, host, and spatial relationships reveal how the MPXV, first discovered in 1958, spilled over from mammals to humans, and spread quickly in 2022 out of Africa into Europe and North America. * The MPXV genomes of a Dormouse and Rope squirrel (USA, 2003) identical to that of a Gambian pouched rat are not shown. (The ANI NJ unrooted trees were constructed from the pairwise comparison table).
Figure 5
Figure 5
Comparative analysis of representative Monkeypox, Molluscipox, and other Orthopox viral genomes (A) Phylogenetic tree showing the genetic distances between MPXV and the Outgroup (MOCV, CPXV, VACV, VARV). The three clades of MPXV are distinct with intra-clade genetic evolution identified within Clade 3. See text for details regarding the representative Clade 3 samples. The NJ tree is rooted at MOCV; (B) Dot-plot comparison of nucleotide similarity between MPXV Clades (1 versus 2) and (2 versus 3). The unbroken diagonal represents identity between two sequences. Breaks (red arrowheads) indicate significant differences. Specifically, MPXV REF (ZAI-96) Clade 1 possesses 3 genes which are deleted (D14L) or mutated (B10R and B14R) in Clade 2 located at the breaks near positions 20k and 168k, respectively. Clades 2 and 3 are genetically similar without notable breaks; (C) Pairwise comparisons show 90% to 92% alignment similarity (=) between MPXV Clades, and > 99% nucleotide identity (↕) within aligned regions. In contrast, the MPXV and outgroup genomes differed by >12% (<88.6% AP) with smallpox (VARV) highlighted (●). AP, alignment percentage; ANI, average nucleotide identity; MOCV, Molluscum contagiosum virus; CPXV, Cowpox virus; VARV, Variola virus; VACV, Vaccinia virus.
Figure 6
Figure 6
Utility of the MPXV database for sequence and phylogenetic analysis using a Chimpanzee dataset. (A) A deep-sequenced MPXV sample (ERR3485797) extracted from Chimpanzee feces is analyzed with the Viral Hybrid Capture (VHC) automated workflow. The resultant VHC track list displays (top to bottom): read mapping against the closest MPVX reference genome, annotated variant track, MPXV amino acid track, and low coverage areas; (B) Phylogenetic analysis of the consensus sequences derived from the Chimpanzee dataset (n = 14) reveals significant divergence from the human MPXV. Among the Chimpanzee MPXV sequences, two clades are revealed; (C) Viral-host Integration Site (VIS) analysis for sample (ERR3485797) visualized as circular plots. The bipartite MPXV/Chimpanzee genomes in chromosome view (left) and gene view (1,000,000 × zoom, right) did not reveal any sites of viral integration. (Virus–host integration linkages are designated by bi-directional curvilinear lines between the virus and host genomes).

Similar articles

Cited by

References

    1. Magnus P.V., Andersen E.K., Petersen K.B., Birch-Andersen A. A pox-like disease in cynomolgus monkeys. Acta Pathol. Microbiol. Scand. 1959;46:156–176. doi: 10.1111/j.1699-0463.1959.tb00328.x. - DOI
    1. Arita I., Henderson D.A. Smallpox and monkeypox in non-human primates. Bull. World Health Organ. 1968;39:277–283. - PMC - PubMed
    1. Ladnyj J.D., Ziegler P., Kima E. A human infection caused by monkeypox virus in Basankusu Territory, Democratic Republic of the Congo. Bull World Health Organ. 1972;46:593. - PMC - PubMed
    1. Antunes F., Cordeiro R., Virgolino A. Monkeypox: From A Neglected Tropical Disease to a Public Health Threat. Infect. Dis. Rep. 2022;14:772–783. doi: 10.3390/idr14050079. - DOI - PMC - PubMed
    1. World Health Organization (WHO) [(accessed on 10 November 2022)]. Available online: https://www.who.int/director-general/speeches/detail/who-director-genera....

Publication types