Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 28;14(2):e0213039.
doi: 10.1371/journal.pone.0213039. eCollection 2019.

GenomeGraphR: A user-friendly open-source web application for foodborne pathogen whole genome sequencing data integration, analysis, and visualization

Affiliations

GenomeGraphR: A user-friendly open-source web application for foodborne pathogen whole genome sequencing data integration, analysis, and visualization

Moez Sanaa et al. PLoS One. .

Abstract

Food safety risk assessments and large-scale epidemiological investigations have the potential to provide better and new types of information when whole genome sequence (WGS) data are effectively integrated. Today, the NCBI Pathogen Detection database WGS collections have grown significantly through improvements in technology, coordination, and collaboration, such as the GenomeTrakr and PulseNet networks. However, high-quality genomic data is not often coupled with high-quality epidemiological or food chain metadata. We have created a set of tools for cleaning, curation, integration, analysis and visualization of microbial genome sequencing data. It has been tested using Salmonella enterica and Listeria monocytogenes data sets provided by NCBI Pathogen Detection (160,000 sequenced isolates in 2018). GenomeGraphR presents foodborne pathogen WGS data and associated curated metadata in a user-friendly interface that allows a user to query a variety of research questions such as, transmission sources and dynamics, global reach, and persistence of genotypes associated with contamination in the food supply and foodborne illness across time or space. The application is freely available (https://fda-riskmodels.foodrisk.org/genomegraphr/).

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Simplified example of the search strategy.
A. the complete network includes all nodes and link the nodes that are closer, in SNP distance, than a given, user-specified, threshold. B. Selecting some strains (e.g. based on their isolation source), the connected components are limited to those strains and the clinical strains closer than the SNP threshold. This graph shows the strains from the isolation source that are potentially linked to clinical strains. It includes only clinical strains and strains from the selected isolation source. C. in order to verify if these links are meaningful, all the additional strains, from any sources, that are closer than the SNP threshold to the clinical strains are recalled, forming a sub-network.
Fig 2
Fig 2. Categorization scheme of strain isolation sources with relative numbers of isolates illustrated by the width of the bank in this Sankey plot–non-clinical strains.
Top: L. monocytogenes strains (root: 10,912 isolates), Bottom: S. enterica strains (root: 49,525 strains).
Fig 3
Fig 3
Left: Numbers of isolates (scale on the right axis) and number of SNP clusters (scale on the left axis) as a function of time (creation date of the target in the NCBI database), for L. monocytogenes (top) and S. enterica (bottom). Right: Probability for a newly created clinical strains of being genetically matched with a non-clinical strain previously isolated, as a function of time and SNP threshold for L. monocytogenes (top) and S. enterica (bottom). Note: the 2013 artifact for S. enterica is linked to the massive inclusion of new strains in 2013.
Fig 4
Fig 4. Box-plot of the number of connections per strains (Degree—k) as a function of the year of creation of the target, per year (left: L. monocytogenes; right: S. enterica. SNP threshold = 12).
Fig 5
Fig 5. Connected components characteristics at SNP threshold equal to 12 (left: Listeria monocytogenes, right: Salmonella).
Each point represents a connected component, placed on the x-axis at its number of nodes (n) and on the y-axis at its number of links, both in log10 scale. The upper line represents n × (n—1)/2, the maximum number of possible links and the lower, dashed line represents (n-1) the minimum number of links.
Fig 6
Fig 6
Left: the isolation source tree. Hovering the mouse on a node provides the number of strains from this isolation source in the database. Clicking on the node provides the graph on the right. Right: Clinical strains connected to shell egg strains. A connection exists when the SNP differences between a clinical strain and a non-clinical strain is less or equal to 12, leading to Connected components. The framed CC was selected to show an example of the in-depth analysis of clinical case sources.
Fig 7
Fig 7. Examples of connectivity between food category and clinical cases (SNP threshold = 12).
CCs: connected components, (): number of strains, → connected with SNPs ≤ 12.
Fig 8
Fig 8. Examples of connectivity between a sub-set of strains (Food/environmental strains isolated in Canada) and clinical cases (SNP threshold = 12).
CCs: connected components, (): number of strains, → connected with SNPs ≤ 12.
Fig 9
Fig 9. Sub-network menu.
Fig 10
Fig 10. Example of visualization and analysis of a sub-network.
Fig 11
Fig 11. Map illustrating the origin of the strains.
Note that the dots are placed at random within the limits of the state (United States) or the country (The position of each dot doesn’t represent the actual location of sampling). Strains from the US not assigned to a specific State are placed in the blue square. Clinical strains are in black.

Similar articles

Cited by

References

    1. Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, et al. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database. J Clin Microbiol. 2016;54(8):1975–83. Epub 2016/03/25. 10.1128/JCM.00081-16 - DOI - PMC - PubMed
    1. Byrne L, Adams N, Glen K, Dallman TJ, Kar-Purkayastha I, Beasley G, et al. Epidemiological and Microbiological Investigation of an Outbreak of Severe Disease from Shiga Toxin-Producing Escherichia coli O157 Infection Associated with Consumption of a Slaw Garnish. J Food Prot. 2016;79(7):1161–8. Epub 2016/07/01. 10.4315/0362-028X.JFP-15-580 . - DOI - PubMed
    1. Franz E, Gras LM, Dallman T. Significance of whole genome sequencing for surveillance, source attribution and microbial risk assessment of foodborne pathogens. Current Opinion in Food Science. 2016;8:74–9. 10.1016/j.cofs.2016.04.004. - DOI
    1. Octavia S, Wang Q, Tanaka MM, Kaur S, Sintchenko V, Lan R. Delineating community outbreaks of Salmonella enterica serovar Typhimurium by use of whole-genome sequencing: insights into genomic variability within an outbreak. J Clin Microbiol. 2015;53(4):1063–71. Epub 2015/01/23. 10.1128/JCM.03235-14 - DOI - PMC - PubMed
    1. Jackson BR, Tarr C, Strain E, Jackson KA, Conrad A, Carleton H, et al. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation. Clin Infect Dis. 2016;63(3):380–6. Epub 2016/04/20. 10.1093/cid/ciw242 - DOI - PMC - PubMed

Publication types

MeSH terms