Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 27:10:279.
doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.

GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases

Affiliations

GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases

John-Stuart Brittain et al. Wellcome Open Res. .

Abstract

The increase in volume and diversity of relevant data on infectious diseases and their drivers provides opportunities to generate new scientific insights that can support 'real-time' decision-making in public health across outbreak contexts and enhance pandemic preparedness. However, utilising the wide array of clinical, genomic, epidemiological, and spatial data collected globally is difficult due to differences in data preprocessing, data science capacity, and access to hardware and cloud resources. To facilitate large-scale and routine analyses of infectious disease data at the local level (i.e. without sharing data across borders), we developed GRAPEVNE (Graphical Analytical Pipeline Development Environment), a platform enabling the construction of modular pipelines designed for complex and repetitive data analysis workflows through an intuitive graphical interface. Built on the Snakemake workflow management system, GRAPEVNE streamlines the creation, execution, and sharing of analytical pipelines. Its modular approach already supports a diverse range of scientific applications, including genomic analysis, epidemiological modeling, and large-scale data processing. Each module in GRAPEVNE is a self-contained Snakemake workflow, complete with configurations, scripts, and metadata, enabling interoperability. The platform's open-source nature ensures ongoing community-driven development and scalability. GRAPEVNE empowers researchers and public health institutions by simplifying complex analytical workflows, fostering data-driven discovery, and enhancing reproducibility in computational research. Its user-driven ecosystem encourages continuous innovation in biomedical and epidemiological research but is applicable beyond that. Key use-cases include automated phylogenetic analysis of viral sequences, real-time outbreak monitoring, forecasting, and epidemiological data processing. For instance, our dengue virus pipeline demonstrates end-to-end automation from sequence retrieval to phylogeographic inference, leveraging established bioinformatics tools which can be deployed to any geographical context. For more details, see documentation at: https://grapevne.readthedocs.io.

Keywords: automated workflows; data science; epidemiology; genomics; graphical interface; open-source; outbreaks; snakemake.

Plain language summary

With the growing amount of data on infectious diseases, researchers have new opportunities to improve public health decisions and pandemic preparedness. However, analyzing this vast and diverse data—spanning clinical records, genomic sequences, epidemiological trends, and geographic information—can be challenging due to differences in data processing methods, technical expertise, and access to computing resources. To address these challenges, we developed GRAPEVNE, a user-friendly platform that helps researchers build and manage complex data analysis workflows using a visual interface. Built on the Snakemake workflow management system, GRAPEVNE simplifies the process of organizing and running large-scale studies, making it easier to track outbreaks, analyze disease patterns, and process health data efficiently. Its modular approach allows users to customize workflows based on their specific needs, ensuring flexibility and ease of use. As an open-source platform, GRAPEVNE fosters collaboration and rolling development, supporting a wide range of applications, including genomic analysis, epidemiological modeling, and outbreak monitoring. Researchers can use it for tasks such as studying viral evolution, predicting disease spread, and processing epidemiological data across different geographical contexts. By streamlining data analysis, GRAPEVNE empowers public health institutions and researchers to make data-driven decisions more effectively. For more details, visit: https://grapevne.readthedocs.io.

PubMed Disclaimer

Conflict of interest statement

No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. GRAPEVNE interface.
(Left) The main canvas shows the workflow under construction with modules and module groups. Selecting a module (‘Download Genbank’ in this example) opens the documentation and configuration panel. Parameter presets can be saved and loaded for convenience. Modules can be browsed from configured repositories via the module library, which includes ‘Repository’, ‘Project’ and free-text search facilities (repositories are configured in the Settings panel, accessible from the Navigation bar). An online catalogue of modules is available via our in-built ‘vneyard’ module browser. Finally, workflows can be tested and packaged for distribution. (Right) Example Snakemake rule making use of grapevne wrappers which manage namespace redirection and compatibility checks. Wrapper functions are highlighted in blue. Several wrappers (such as input, output and params) are represented and can be configured via the GUI. Others, such as script and resource provide support services, such as the provision of payloads, in this case.
Figure 2.
Figure 2.. Design and implementation of analytical pipelines for reconstructing the spread of SARS-CoV-2 Variants of Concerns (VOCs) and Dengue virus.
( A) The red panel illustrates the high-level structure of the SARS-CoV-2 VOC pipeline, integrating genomic data from GISAID (red arrow) and epidemiological data from other sources (e.g., case data from OWID ( https://ourworldindata.org/); green arrow) to infer the historical dispersal patterns of the virus at a global scale. This pipeline serves as a template (grey box) for the Dengue pipeline in the blue panel, with three key modifications: (i) the time-calibration module based on TreeTime is replaced by an equivalent module based on BEAST instead, (ii) an additional module is added to perform evolutionary hypothesis testing using HyPhy , and (iii) an additional module is added to visualize output from the discrete trait analysis using auspice . ( B) An expanded view of lower-level modules nested within the time-calibration module using BEAST . A FASTA file containing pathogen genomes is used as input to generate an XML file, following the configurations as specified in an XML template generated by the user through a graphic user-interface application known as BEAUti. The XML file is then used as input by BEAST to perform Markov chain Monte Carlo (MCMC) sampling. Intermediate output is visualized and assessed for convergence using Tracer. The user then has the option to either continue running the analysis and proceed with further downstream analyses (e.g., generating the maximum clade credibility (MCC) tree using LogCombiner), or to modify the XML (e.g., tuning parameters associated with prior distributions within BEAUti) and rerun the BEAST analysis in an iterative fashion.

Similar articles

References

    1. Heesterbeek H, Anderson RM, Andreasen V, et al. : Modeling infectious disease dynamics in the complex landscape of global health. Science. 2015;347(6227): aaa4339. 10.1126/science.aaa4339 - DOI - PMC - PubMed
    1. Blauer B, Brownstein JS, Gardner L, et al. : Innovative platforms for data aggregation, linkage and analysis in the context of pandemic and epidemic intelligence. Euro Surveill. 2023;28(24): 2200860. 10.2807/1560-7917.ES.2023.28.24.2200860 - DOI - PMC - PubMed
    1. Kraemer MUG, Cummings DAT, Funk S, et al. : Reconstruction and prediction of viral disease epidemics. Epidemiol Infect. 2018;147: e34. 10.1017/S0950268818002881 - DOI - PMC - PubMed
    1. Hill V, Ruis C, Bajaj S, et al. : Progress and challenges in virus genomic epidemiology. Trends Parasitol. 2021;37(12):1038–1049. 10.1016/j.pt.2021.08.007 - DOI - PubMed
    1. UK Health Security Agency: COVID-19 variants identified in the UK – latest updates. GOV.UK,2021. Reference Source

LinkOut - more resources