Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020;2(1):20.
doi: 10.1186/s42522-020-00026-3. Epub 2020 Oct 19.

Optimizing open data to support one health: best practices to ensure interoperability of genomic data from bacterial pathogens

Affiliations
Review

Optimizing open data to support one health: best practices to ensure interoperability of genomic data from bacterial pathogens

Ruth E Timme et al. One Health Outlook. 2020.

Abstract

The holistic approach of One Health, which sees human, animal, plant, and environmental health as a unit, rather than discrete parts, requires not only interdisciplinary cooperation, but standardized methods for communicating and archiving data, enabling participants to easily share what they have learned and allow others to build upon their findings. Ongoing work by NCBI and the GenomeTrakr project illustrates how open data platforms can help meet the needs of federal and state regulators, public health laboratories, departments of agriculture, and universities. Here we describe how microbial pathogen surveillance can be transformed by having an open access database along with Best Practices for contributors to follow. First, we describe the open pathogen surveillance framework, hosted on the NCBI platform. We cover the current community standards for WGS quality, provide an SOP for assessing your own sequence quality and recommend QC thresholds for all submitters to follow. We then provide an overview of NCBI data submission along with step by step details. And finally, we provide curation guidance and an SOP for keeping your public data current within the database. These Best Practices can be models for other open data projects, thereby advancing the One Health goals of Findable, Accessible, Interoperable and Re-usable (FAIR) data.

Keywords: GenomeTrakr; Genomic epidemiology; Microbial pathogen surveillance; NCBI submission; One health; QA/QC; Whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
INSDC hub showing how genomic data in public databases get analyzed by many different software platforms, for different purposes. Included in this figure are most genomic epidemiology-related open source analysis platforms available in March of 2020, and one private software tool, BioNumerics. BioNumerics is also the only platform with submission capability
Fig. 2
Fig. 2
Screen shot of a cluster within the NCBI-PD browser showing harmonized metadata submissions across five different submitting laboratories (PulseNet, GenomeTrakr, Public Health England, Israel Ministry of Health, and CA Food Inspection Agency). URL: https://www.ncbi.nlm.nih.gov/Structure/tree/#!/tree/Salmonella/PDG000000002.1922/PDS000025876.12?treelabel=sra_center,strain,epi_type,collection_date,geo_loc_name,isolation_source
Fig. 3
Fig. 3
Density plot showing the distribution of genome lengths for a random sample of isolates with Illumina sequence data available from NCBI Pathogen Detection portal (n = 10,000 for all species except V . paramaemolyticus where n = 1414 due to smaller number of samples). Sequences were assembled using SKESA 2.2 and the bars indicate ±3 standard deviations from the mean. Mbp = mega base pairs
Fig. 4
Fig. 4
Plot of mean coverage (as reported by SKESA v. 2.2) vs number of contigs for a random sample of isolates with Illumina sequence data available from NCBI Pathogen Detection portal (n = 10,000 for all species except V. parahaemolyticus where n = 1414 due to smaller number of samples). The smoothed line was generated using generalized additive smoothing in R. Assembly quality, as measured by a decrease in the number of contigs, generally increases with increasing coverage
Fig. 5
Fig. 5
Overview of the database structure at NCBI showing an example Salmonella umbrella BioProject with three linked laboratory data BioProjects, each with their own BioSamples and associated sequence data

References

    1. Armstrong GL, MacCannell DR, Taylor J, Carleton HA, Neuhaus EB, Bradbury RS, et al. Pathogen genomics in public health. N Engl J Med. 2019;381:2569–2580. doi: 10.1056/NEJMsr1813907. - DOI - PMC - PubMed
    1. Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, et al. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol. 2016;54:1975–1983. doi: 10.1128/JCM.00081-16. - DOI - PMC - PubMed
    1. Allard MW, Bell R, Ferreira CM, Gonzalez-Escalona N, Hoffmann M, Muruvanda T, et al. Genomics of foodborne pathogens for microbial food safety. Curr Opin Biotechnol. 2018;49:224–229. doi: 10.1016/j.copbio.2017.11.002. - DOI - PubMed
    1. Tolar B, Joseph LA, Schroeder MN, Stroika S, Ribot EM, Hise KB, et al. An Overview of PulseNet USA Databases. Foodborne Pathog Dis. 2019;6:457–62. - PMC - PubMed
    1. Ashton PM, Nair S, Peters TM, Bale JA, Powell DG, Painset A, et al. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ. 2016;4:e1752. - PMC - PubMed

LinkOut - more resources