Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 17;9(6):e99979.
doi: 10.1371/journal.pone.0099979. eCollection 2014.

Standardized metadata for human pathogen/vector genomic sequences

Affiliations

Standardized metadata for human pathogen/vector genomic sequences

Vivien G Dugan et al. PLoS One. .

Abstract

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have the following interest: Julia Puzak is employed by Kelly Government Solutions. There are no patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Figures

Figure 1
Figure 1. NIAID GSCID/BRC Project and Sample Application Standard Overview.
Coverage of the twelve major data categories in the five data field collections is shown.
Figure 2
Figure 2. Semantic Network of the Core Project Data Fields.
A semantic representation of the entities relevant to describe infectious disease projects based on the OBI and other OBO Foundry ontologies is shown. Distinctions are made between material entities (blue outlines), information entities and qualities (black outlines), and processes (red outlines). Entities are connected by standard semantic relations, in italic. The subset of entities selected as Core Project fields are noted with ovals containing the respective Field ID. For example, both the “Project Title” (CP1) and “Project ID” (CP2) denote an OBI:Investigation; the “Project Description” (CP3) is_about the same OBI:Investigation.
Figure 3
Figure 3. Semantic Network of the Core Sample Data Fields.
A semantic representation of the entities relevant to describe infectious disease samples based on the OBI and other OBO Foundry ontologies is shown. Distinctions are made between material entities (blue outlines), information entities and qualities (black outlines), and processes (red outlines). Entities are connected by standard semantic relations, in italic. The subset of entities selected as Core Sample fields are noted with ovals containing the respective Field ID. For example, the OBI:organism has_quality “Specimen Source Gender” (CS5), which is equivalent to the PATO:biological sex, and has_quality PATO:age, and has_quality “Specimen Source Health Status” (CS8), which is equivalent to PATO:organismal status. PATO:age is_quality_measured_as OBI:age since birth measurement datum, which has_measurement_value “Specimen Source Age – Value” (CS6) and has_measurement_unit_label “Specimen Source Age – Unit” (CS7).

References

    1. Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, et al. (2010) The genomes on line database (gold) in 2009: Status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research 38: D346–354. - PMC - PubMed
    1. Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, et al. (2012) The genomes online database (gold) v.4: Status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research 40: D571–579. - PMC - PubMed
    1. Takala-Harrison S, Clark TG, Jacob CG, Cummings MP, Miotto O, et al. (2013) Genetic loci associated with delayed clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia. Proc Natl Acad Sci U S A 110: 240–5. - PMC - PubMed
    1. Svensson JP, Pesudo LQ, Fry RC, Adeleye YA, Carmichael P, et al. (2011) Genomic phenotyping of the essential and non-essential yeast genome detects novel pathways for alkylation resistance. BMC Systems Biology 5: 157. - PMC - PubMed
    1. van Opijnen T, Camilli A (2012) A fine scale phenotype-genotype virulence map of a bacterial pathogen. Genome Research 22: 2541–2551. - PMC - PubMed

Publication types

LinkOut - more resources