Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010;11(3):R27.
doi: 10.1186/gb-2010-11-3-r27. Epub 2010 Mar 9.

XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments

Affiliations

XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments

Morris A Swertz et al. Genome Biol. 2010.

Abstract

We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Extensible genotype and phenotype object model. Experimental genotype and (molecular) phenotype data can be described using Subject, Trait, Data and DataElement; the experimental procedures can be described using Investigation, Protocol and ProtocolApplication (B). Specific attributes and relationships can be added by extending core data types, for example, Sample and Gene (A, C). See Table 2, 3 and 4 for uses of this model. The model is visualized in the Unified Modeling Language (UML): arrows denote relationships (Data has a field Investigation that refers to Investigation ID); triangle terminated lines denote inheritance (Metabolite inherits all properties ID, Name, Type from Trait, next to its own attributes Mass, Formula and Structure); triangle terminated dotted lines denote use of interfaces (Probe 'implements' properties of Locus); relationships are shown both as arrows and as properties ('xref' for one-to-many, 'mref' for many-to-many relationships). Asterisks mark FuGE-derived types (for example, Protocol*).
Figure 2
Figure 2
Simple text file format. A whole investigation can be stored by using easy-to-create tabular text files for annotations or matrix-shaped text files for raw and processed data. Each 'annotation' file relates to one data type in the object model shown in Figure 1 - for example, the rows in the file 'probe.txt' will have the columns named in data type 'Probe'. Each 'data' file contains data elements and has row names and column names referring to annotation files - for example, 'genotypes.txt' may refer to 'marker.txt' names as row names and 'individual.txt' names as column names. If convenient, constant values can be described in the constant.properties file such as 'species_name'.
Figure 3
Figure 3
Graphical User Interfaces. A user interface enables biologists to add and retrieve data and run integrated tools. Genotype and phenotype information can be explored by investigation, subjects, traits or data. Hyperlinks following cross-references of the object model point to related information. Items indicated by 1-9 are described in the main text. See Table 5 for uses of this GUI. See also our online demonstrator at [51].
Figure 4
Figure 4
Application programming interfaces. APIs enable bioinformaticians to integrate data and tools with XGAP using web services, R-project language, Java, or simple HTTP hyperlinks. The figure shows how scientists can use the R/API to upload raw investigation data (Scientist A) so another researcher can download these data and immediately use it for the calculation of QTL profiles and upload the results thereof back to the XGAP database for use by another collaborator (Scientist B). Note how 'add.datamatrix' enables flexible upload of matrices for any Subject or Trait combination; this function adds one row to Data for each matrix, and as many rows to DataElement as the matrix has cells. See Table 6 for uses of these APIs.
Figure 5
Figure 5
Customizing XGAP. A file in MOLGENIS domain-specific language is used to describe and customize the XGAP database infrastructure in a few lines. (a) Shows how the addition of a Metabolite data entity as a new variant of Trait takes only a few lines in this DSL. (b) Shows how the GUI can be customized to suit a particular experimental process. (c) Shows how programmers can add a 'plug-in' program that is not generated by MOLGENIS but written by hand in Java.
Figure 6
Figure 6
Auto-generation of XGAP software. Open source generator tools are used to produce a customized XGAP software infrastructure. 1, The XGAP object model is described using the MOLGENIS' little modeling language (Figure 4). 2, Central software termed MolgenisGenerate runs several generators, building on the MOLGENIS catalogue of reusable assets. 3, At the push of the button, the software code for a working XGAP implementation is automatically generated from the DSL file. GUI and APIs provide simple tools to add and retrieve data, while the reusable assets of MOLGENIS hide the complexity normally needed to implement such tools. For customization, only simple changes to the XGAP model file are required; the MOLGENIS generator takes care of rewriting all the necessary files of SQL and Java software code, saving time and ensuring a consistent quality.

Similar articles

Cited by

References

    1. Li Y, Breitling R, Jansen RC. Generalizing genetical genomics: getting added value from environmental perturbation. Trends Genet. 2008;24:518–524. doi: 10.1016/j.tig.2008.08.001. - DOI - PubMed
    1. Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet. 2001;17:388–391. doi: 10.1016/S0168-9525(01)02310-1. - DOI - PubMed
    1. Li J, Burmeister M. Genetical genomics: combining genetics with gene expression analysis. Hum Mol Genet. 2005;14(Spec No 2):R163–169. doi: 10.1093/hmg/ddi267. - DOI - PubMed
    1. Editorial. Pinpointing expression differences. Nat Genet. 2007;39:1175. - PubMed
    1. Goring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, Maccluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat Genet. 2007;39:1208–1216. doi: 10.1038/ng2119. - DOI - PubMed

Publication types

LinkOut - more resources