Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul;61(4):675-89.
doi: 10.1093/sysbio/sys025. Epub 2012 Feb 22.

NeXML: rich, extensible, and verifiable representation of comparative data and metadata

Affiliations

NeXML: rich, extensible, and verifiable representation of comparative data and metadata

Rutger A Vos et al. Syst Biol. 2012 Jul.

Abstract

In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.
Figure 1.
Data modeling in evolutionary informatics. Nodes in phylogenetic trees (shown left) and character-state data (right) can be conceptualized as forming a nexus at the center of which are operational taxonomic units (OTUs). Under this model, any number of trees and character-state data sets (the latter themselves following an entity–attribute–value model) are represented as data that apply to OTUs, which in principle can also be decorated with additional metadata such as taxonomy database record identifiers. This conceptualization is implicit in the NEXUS format and applications that build on it such as Mesquite (Maddison and Maddison 2011) and has been reused in NeXML. (Figure modified from Hladish et al. 2007).
F<sc>igure</sc> 2.
Figure 2.
NeXML syntax example: TreeBASE OTU annotations. This example shows a single container of OTUs (the otus element) with a single OTU (the otu element) that was submitted to the database with the label Zenodorus cf. orbiculatus. Matching this label to the uBio web service returned a close match with the record for Zenodorus orbiculatus (with the namebank identifier 3546132), which uBio describes as matching the NCBI taxonomy record for Zenodorus cf. orbiculatus d008 (with taxon identifier 393215). The normalized OTU label was defined within the context of TreeBASE study S1787.
F<sc>igure</sc> 3.
Figure 3.
NeXML syntax example: Phenoscape character states. This code fragment shows how the Phenoscape project uses the NeXML-compatible application Phenex to annotate character states. A character, identified by “char01,” is defined as able to occupy any of the states from state set “states01.” Within that state set, in this instance, there is only the state “state0102.” That state is annotated with an EQ statement (here expressed in a Phenex-specific XML dialect) that identifies a morphological feature called the “antorbital” and qualifies it as being absent. (In a complete NeXML document, the format element occurs within a characters element, which is preceded by a container of OTUs, i.e., an otus element, here omitted for clarity.)

References

    1. Adida B, Birbeck M, McCarron S, Pemberton S. RDFa in XHTML: Syntax and Processing. 2008 Available from: http://www.w3.org/TR/rdfa-syntax/. [Wed Sep 21, 2011]
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Balhoff JP, Dahdul WM, Kothari CR, Lapp H, Lundberg JG, Mabee P, Midford PE, Westerfield M, Vision TJ. Phenex: ontological annotation of phenotypic diversity. PLoS One. 2010;5:e10500. - PMC - PubMed
    1. Beaman RS, Cellinese N. The tree of life knowledge and information network. 2010. Available from: http://www.tolkin.org. [Wed Sep 21, 2011] - PMC - PubMed
    1. Beckett D. RDF/XML syntax specification (revised). W3C Recommendation. 2004. Available from: http://www.w3.org/TR/REC-rdf-syntax/. [Wed Sep 28, 2011]

Publication types