The Mpox contextual data specification package: a data curation toolkit to support collaborative pathogen genomic surveillance
- PMID: 41609690
- DOI: 10.1099/mgen.0.001614
The Mpox contextual data specification package: a data curation toolkit to support collaborative pathogen genomic surveillance
Abstract
A sudden increase in the number of Mpox virus (MPXV) cases worldwide prompted the WHO to declare a Public Health Emergency of International Concern in 2022 and again in 2024. Public health genomic surveillance of MPXV in impacted areas is ongoing to inform national and international situational awareness, with a growing number of sequences available in public sequence repositories. Critical to genomic surveillance is well-curated and harmonized contextual data - the sample metadata, epidemiological and clinical data, lab results and method information that enables the interpretation of sequence data for public health responses and decision-making. Contextual data, however, is often unstructured or highly variable in formats, granularity and terminology. This variability usually requires a great deal of manual clean-up before it can be integrated and used for analysis, which can be laborious, time-consuming and error-prone. To facilitate harmonization of contextual data for genomic surveillance during the 2022 and 2024 epidemics, an MPXV contextual data specification was developed by the Centre for Infectious Disease Genomics and One Health (Simon Fraser University, Canada) in collaboration with several teams at Canada's National Microbiology Lab [Public Health Agency of Canada (PHAC)] as well as provincial public health laboratories. The MPXV specification provides standardized ontology-based fields and terms for capturing information about MPXV samples and infections and prioritizes geo-temporal, data provenance and sampling strategy information for surveillance. The specification utilizes the same semantic framework used to develop other public health pathogen genomics data standards, thus demonstrating its adaptability for additional infectious diseases. The specification has been implemented as a template within an open-source application known as the DataHarmonizer, which provides curation, validation and data transformation features and functions. The MPXV specification is already being utilized in Canada and is freely available for international use. The MPXV specification adds to a growing library of interoperable, harmonized community consensus contextual data standards for public health pathogen genomics.
Keywords: Mpox; Mpox virus (MPXV); contextual data; data management; genomic surveillance; harmonization.
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous
