Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;42(Database issue):D546-52.
doi: 10.1093/nar/gkt979. Epub 2013 Oct 25.

Ensembl Genomes 2013: scaling up access to genome-wide data

Affiliations

Ensembl Genomes 2013: scaling up access to genome-wide data

Paul Julian Kersey et al. Nucleic Acids Res. 2014 Jan.

Abstract

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Species selection in Ensembl Bacteria. The figure shows the selection of a basket of genomes for use in a BLAST search. A tree-based navigation system allows the selection of defined portions of the taxonomy for use as library sequences. An autocomplete feature assists the location of particular genomes within the tree.
Figure 2.
Figure 2.
Taxonomic distribution of gene families in the pan-taxonomic comparative analysis in release 19 of Ensembl Genomes. Large numbers of families [defined by clustering according to the Ensembl Gene Trees algorithm (27)] are found only in one domain of life. However, families can be found spanning all combination of domains. The most overrepresented spans (compared with expectations based on the same proportion of families being covering each domain, but assuming the co-coverage of two domains is random) are (i) all five domains and (ii) all four non-bacterial domains; the most underrepresented spans are (i) bacteria and metazoa and (ii) bacteria, metazoa and fungi. For each family of related proteins, a gene tree is constructed and made available for visualization and download, estimating the evolutionary history of that family.
Figure 3.
Figure 3.
The barley genome represented in Ensembl Plants. The figure shows resequencing alignments from a number of cultivars against the reference cultivar Morex genome assembly and annotation for a sequenced contig given approximate chromosomal location through integration with the genetic map.

References

    1. Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, et al. Ensembl 2013. Nucleic Acids Res. 2013;41:D48–D55. - PMC - PubMed
    1. The UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2012;41:D43–D47. - PMC - PubMed
    1. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2011;40:D306–D312. - PMC - PubMed
    1. Kasprzyk A. BioMart: driving a paradigm change in biological data management. Database. 2011;2011:bar049–bar049. - PMC - PubMed
    1. Megy K, Emrich SJ, Lawson D, Dialynas E, Hughes DS, Koscielny G, Louis C, Maccallum RM, Redmond SN, et al. VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic Acids Res. 40:D729–D734. - PMC - PubMed

Publication types