Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 3:13:254.
doi: 10.1186/1471-2105-13-254.

VarioML framework for comprehensive variation data representation and exchange

Affiliations

VarioML framework for comprehensive variation data representation and exchange

Myles Byrne et al. BMC Bioinformatics. .

Abstract

Background: Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement.

Results: The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components.

Conclusions: VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simplified conceptual UML object model used in VarioML. The VarioML object model is derived from Observ-OM (http://www.observ-om.org/wiki/ObservStart), with some modifications to simplify implementation. E.g., Observable Feature (such as phenotype or mutation name) and Observed Value (existence of phenotype or variation) are denormalized into a single XML element. This avoids unnecessary nesting of observation elements which do often have one-to-one relationship, in the XML implementation. Entities are composed into Observations, having properties such as evidence codes, observation protocols and observation time. Associations between elements are described as single lines, where an asterisk means a 0-to-many multiplicity relationship; i.e. Observation can have one or many evidence codes. All entities also inherit from Annotatable properties which are needed for database cross references and comments. In this case, the open arrow symbol means inheritance or an is-a relationship.
Figure 2
Figure 2
A Cafe Variome submission of a COL1A1 variant. The different VarioML elements of the data submitted are flanked by the corresponding XML tags and explained in the text.
Figure 3
Figure 3
VarioML elements extending the core schema. The VarioML elements describing the effect of an AIRE variant at the transcript and protein levels are flanked by the corresponding XML tags and explained in the text.
Figure 4
Figure 4
VarioML in JSON format. XML elements are mapped to JSON objects using JAXB and Jackson annotations via VarioML's Java API. Repeating XML elements become pluralised into JSON arrays. Because JSON does not have an equivalent to XML attributes, XML attribute names can clash with inner element names. In these cases, the JSON name for the XML attribute is changed. Otherwise, mapping VarioML from XML to JSON is a direct transformation of the data structure.

References

    1. Ji H. Improving bioinformatic pipelines for exome variant calling. Genome Medicine. 2012;4:7. - PMC - PubMed
    1. Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, Milosavljevic A, Gibbs RA, Yu FL. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012;13:1–3. - PMC - PubMed
    1. McLaren W, Pritchard B, Rios D, Chen YA, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–2070. - PMC - PubMed
    1. Editors. On not reinventing the wheel. Nat Genet. 2012;44:233. - PubMed
    1. GEN2PHEN Knowledge Center. Resources. http://www.gen2phen.org/resources.

Publication types

LinkOut - more resources