Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jan;14(1):160-9.
doi: 10.1101/gr.1645104.

EnsMart: a generic system for fast and flexible access to biological data

Affiliations
Comparative Study

EnsMart: a generic system for fast and flexible access to biological data

Arek Kasprzyk et al. Genome Res. 2004 Jan.

Abstract

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to 'non-Ensembl' data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
MartView start page showing available species and foci. The availability of a particular focus depends on species. Eachavailable species is designated withan assembly version.
Figure 2
Figure 2
MartView filter page showing some of the available filters. A wide range of filter types can be applied, in any combination. The system supports batch querying, and a set of external identifiers can be uploaded directly from a file. A summary table provides feedback on the number of items that pass the currently selected filters, allowing users to modify their searches in an interactive way. The additional window shows the tool for finding terms in the expression vocabulary.
Figure 3
Figure 3
MartView output page and an example of a corresponding output in HTML format. `Tabs' at the top show the output topics available: With a gene focus as shown here, one chooses between features, SNPs, genomic structures, and sequences. A full description of eachoption is available in the online help. `Features' has been selected, and most of the available data types are shown.
Figure 4
Figure 4
MartView output page showing the range of sequence retrieval options (human gene focus, sequences tab). An example of the corresponding FASTA output is also shown. The gene sequence options include gene sequence, gene withflanking sequence, upstream or downstream sequences of user-specified length, exons, transcripts, and coding sequence only. The user is guided by a graphical representation of sequence options.
Figure 5
Figure 5
MartExplorer GUI implementing user abstractions using a modified tree. As users click on each node of the tree, they are presented with input fields for the data required for that part of the query. As filters and attributes are chosen, they are moved onto the tree below their respective nodes, giving the user a single, interactive view of an entire query (shown on the left). Once all required data have been provided, the output format is chosen and the results are exported.
Figure 6
Figure 6
Screenshot of an interactive MartShell session. Upon entering the interactive session, the user types `use' and then hits the tab key twice to get a list of possible data sets available. She chooses the data set, then begins to type a query. After typing `get sequence', she hits the tab key twice to get a list of possible sequences available, then completes her query for 1000 bp of upstream gene flanking sequence for all genes that are both disease genes and transmembrane domains.
Figure 7
Figure 7
An overview of EnsMart architecture. The domain-specific staging area and mart building tools are shown at the top of the diagram; the domain-independent EnsMart database and user interfaces are shown at the bottom. The domain-independent part can be adapted to other data sets.
Figure 8
Figure 8
A diagram of the EnsMart `reversed star' schema.

Similar articles

Cited by

References

    1. Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G.G., et al. 2003. ArrayExpress—A public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31: 68-71. - PMC - PubMed
    1. Bussey, K.J., Kane, D., Sunshine, M., Narasimhan, S., Nishizuka, S., Reinhold, W.C., Zeeberg, B., Ajay, W., and Weinstein, J.N. 2003. MatchMiner: A tool for batch navigation among gene and gene product identifiers. Genome Biol. 4: R27. - PMC - PubMed
    1. Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., et al. 2003. Ensembl 2002: Accommodating comparative genomics. Nucleic Acids Res. 31: 38-42. - PMC - PubMed
    1. Devlin, B. 1997. Data warehouse. From architecture to implementation, chapter 2. Addison Wesley Longman, Inc., Reading, MA.
    1. Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., Rees, C.A., Cherry, J.M., Botstein, D., Brown, P.O., et al. 2003. SOURCE: A unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res. 31: 219-223. - PMC - PubMed

WEB SITE REFERENCES

    1. www.ebi.ac.uk/miamexpress; MIAMExpress.
    1. www.rzpd.de/colBox/html/; RZPD's Genome-Matrix.
    1. www.ncbi.nlm.nih.gov; MapViewer at NCBI.
    1. www.ensembl.org/EnsMart; EnsMart.
    1. www.sanger.ac.uk; The Vertebrate Genome Annotation database.

Publication types

LinkOut - more resources