Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun:43:100676.
doi: 10.1016/j.epidem.2023.100676. Epub 2023 Mar 8.

Data pipelines in a public health emergency: The human in the machine

Affiliations

Data pipelines in a public health emergency: The human in the machine

Katy A M Gaythorpe et al. Epidemics. 2023 Jun.

Abstract

In an emergency epidemic response, data providers supply data on a best-faith effort to modellers and analysts who are typically the end user of data collected for other primary purposes such as to inform patient care. Thus, modellers who analyse secondary data have limited ability to influence what is captured. During an emergency response, models themselves are often under constant development and require both stability in their data inputs and flexibility to incorporate new inputs as novel data sources become available. This dynamic landscape is challenging to work with. Here we outline a data pipeline used in the ongoing COVID-19 response in the UK that aims to address these issues. A data pipeline is a sequence of steps to carry the raw data through to a processed and useable model input, along with the appropriate metadata and context. In ours, each data type had an individual processing report, designed to produce outputs that could be easily combined and used downstream. Automated checks were in-built and added as new pathologies emerged. These cleaned outputs were collated at different geographic levels to provide standardised datasets. Finally, a human validation step was an essential component of the analysis pathway and permitted more nuanced issues to be captured. This framework allowed the pipeline to grow in complexity and volume and facilitated the diverse range of modelling approaches employed by researchers. Additionally, every report or modelling output could be traced back to the specific data version that informed it ensuring reproducibility of results. Our approach has been used to facilitate fast-paced analysis and has evolved over time. Our framework and its aspirations are applicable to many settings beyond COVID-19 data, for example for other outbreaks such as Ebola, or where routine and regular analyses are required.

Keywords: COVID-19; Data; Infectious disease modelling.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

LinkOut - more resources