Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 17;18(5):e0285433.
doi: 10.1371/journal.pone.0285433. eCollection 2023.

Phenopacket-tools: Building and validating GA4GH Phenopackets

Affiliations

Phenopacket-tools: Building and validating GA4GH Phenopackets

Daniel Danis et al. PLoS One. .

Abstract

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Methods for creating OntologyClass messages with phenopacket-tools.
A. Builder pattern provided by the protobuf Java framework. B. Predefined constant (static function) that returns an Ontology term singleton instance. Here, BiospecimenType.bloodDNA() generates the ontology term that represents the NCI Thesaurus (NCIt) concept C158416 for the biospecimen type “blood DNA” C. Convenience function provided by phenopacket-tools. D. The JSON representation of OntologyClass message that is generated by the code in panels A, B, and C.
Fig 2
Fig 2. Building phenopackets.
Phenopacket-tools offers convenience functions that streamline the construction of GA4GH phenopackets. (A) The protobuf framework automatically generates Java bindings for messages that are defined in proto files. This panel shows an example of how the bindings can be used to create a PhenotypicFeature element that represents severe weakness of the left triceps muscle with age of onset at eleven years and four months (“P11Y4M”). (B) Phenopacket-tools provides builder classes that contain convenience functions that hide the relative verbosity of the protobuf bindings. (C) JSON representation of the PhenotypicFeature element generated by Panel A or B.
Fig 3
Fig 3. Creating a customized validator and applying it to a phenopacket.
The ValidationResult object contains fields representing validation metadata, the level of the validation (error or warning), the category, and a message (See Table 3).

References

    1. Rehm HL, Page AJH, Smith L, Adams JB, Alterovitz G, Babb LJ, et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 2021;1. doi: 10.1016/j.xgen.2021.100029 - DOI - PMC - PubMed
    1. Jacobsen JOB, Baudis M, Baynam GS, Beckmann JS, Beltran S, Buske OJ, et al. The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat Biotechnol. 2022;40: 817–820. doi: 10.1038/s41587-022-01357-4 - DOI - PMC - PubMed
    1. Haendel MA, Chute CG, Robinson PN. Classification, Ontology, and Precision Medicine. N Engl J Med. 2018;379: 1452–1462. doi: 10.1056/NEJMra1615014 - DOI - PMC - PubMed
    1. den Dunnen JT. Describing Sequence Variants Using HGVS Nomenclature. Methods Mol Biol. 2017;1492: 243–251. doi: 10.1007/978-1-4939-6442-0_17 - DOI - PubMed
    1. Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. 2013. pp. 326–331.

Publication types