Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 2;22(1):104.
doi: 10.1186/s12859-021-04039-1.

vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration

Affiliations

vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration

Robert H Dolin et al. BMC Bioinformatics. .

Abstract

Background: VCF formatted files are the lingua franca of next-generation sequencing, whereas HL7 FHIR is emerging as a standard language for electronic health record interoperability. A growing number of FHIR-based clinical genomics applications are emerging. Here, we describe an open source utility for converting variants from VCF format into HL7 FHIR format.

Results: vcf2fhir converts VCF variants into a FHIR Genomics Diagnostic Report. Conversion translates each VCF row into a corresponding FHIR-formatted variant in the generated report. In scope are simple variants (SNVs, MNVs, Indels), along with zygosity and phase relationships, for autosomes, sex chromosomes, and mitochondrial DNA. Input parameters include VCF file and genome build ('GRCh37' or 'GRCh38'); and optionally a conversion region that indicates the region(s) to convert, a studied region that lists genomic regions studied by the lab, and a non-callable region that lists studied regions deemed uncallable by the lab. Conversion can be limited to a subset of VCF by supplying genomic coordinates of the conversion region(s). If studied and non-callable regions are also supplied, the output FHIR report will include 'region-studied' observations that detail which portions of the conversion region were studied, and of those studied regions, which portions were deemed uncallable. We illustrate the vcf2fhir utility via two case studies. The first, 'SMART Cancer Navigator', is a web application that offers clinical decision support by linking patient EHR information to cancerous gene variants. The second, 'Precision Genomics Integration Platform', intersects a patient's FHIR-formatted clinical and genomic data with knowledge bases in order to provide on-demand delivery of contextually relevant genomic findings and recommendations to the EHR.

Conclusions: Experience to date shows that the vcf2fhir utility can be effectively woven into clinically useful genomic-EHR integration pipelines. Additional testing will be a critical step towards the clinical validation of this utility, enabling it to be integrated in a variety of real world data flow scenarios. For now, we propose the use of this utility primarily to accelerate FHIR Genomics understanding and to facilitate experimentation with further integration of genomics data into the EHR.

Keywords: Clinical genomics; EHR integration; FHIR; Next-generation sequencing; SMART-on-FHIR.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
vcf2fhir conversion. The vcf2fhir utility takes a VCF file as input and outputs a FHIR Genomics report in JSON format. A simplified conceptual representation of the FHIR report is shown here to illustrate the main components. (VCF is a text file. It contains meta-information lines (prefixed with ‘##’), a header line (prefixed with ‘#’), and tab-delimited data lines each containing information about a variant. The CHROM and POS fields indicate the location of the variant. REF indicates the reference allele while ALT indicates the alternate allele. FILTER indicates if the variant call has passed applied filters. INFO is a semicolon-separated series of fields that further characterize a variant. FORMAT is colon-separated list of fields that characterize the genotype. Fields defined in FORMAT are valued for each tested sample, such as the NA12878 sample shown. A FHIR Genomics report is represented as a FHIR Diagnostic Report that contains information about the patient, and a set of observations. Each observation in the FHIR Genomics report conforms to a defined FHIR Observation ‘profile’ that constrains the information conveyed. In particular, the report includes zero or more ‘region-studied’ observations, ‘variant’ observations, and ‘sequence-phase-relationship’ observations, which are described in the text)
Fig. 2
Fig. 2
vcf2fhir conversion showing region-studied capabilities. See text for details. SNP single nucleotide polymorphism, C conversion region (blue lines), S studied region (purple lines), U uncallable region (magenta lines)
Fig. 3
Fig. 3
Loading a VCF file into the SMART Cancer Navigator. To upload a file, the user clicks the “Choose File” button, selects a VCF file from their computer, and then clicks the “Upload file” button. Once a VCF file is uploaded, the user can click the “Download FHIR file” button to have their FHIR file downloaded into their computer’s “Downloads” folder. Error messages show when a user has not submitted a file and when the VCF file is invalid
Fig. 4
Fig. 4
SMART Cancer Navigator variant viewer. The main page of the application after a VCF file has been uploaded. The page populates with variants that were found in common between the converted FHIR file and the gene-variant knowledge base queries
Fig. 5
Fig. 5
Precision genomics integration platform. Platform components include a FHIR-enabled genomic data server (GACS) and a workflow/CDS engine (A2D2). A2D2 can computationally intersect a patient's FHIR-formatted clinical and genomic data with knowledge bases in order to provide on-demand delivery of contextually relevant genomic findings and recommendations to the EHR
Fig. 6
Fig. 6
Face sheet application with genomic annotations. SMART-on-FHIR application that surfaces identified genomic interactions. (Shown is a fully synthetic patient)

References

    1. The Variant Call Format Specification. https://samtools.github.io/hts-specs/VCFv4.3.pdf. Accessed 1 May 2020.
    1. HL7 FHIR v4.0.1. https://www.hl7.org/fhir/. Accessed 15 Oct 2020.
    1. Phenopackets: Standardizing and Exchanging Patient Phenotypic Data. https://www.ga4gh.org/news/phenopackets-standardizing-and-exchanging-pat.... Accessed 15 Oct 2020.
    1. HL7 FHIR Genomics Reporting Implementation Guide. http://hl7.org/fhir/uv/genomics-reporting/index.html. Accessed 15 Oct 2020.
    1. HL7 FHIR mCode Implementation Guide. http://hl7.org/fhir/us/mcode/. Accessed 15 Oct 2020.