Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 1;2017(1):bax020.
doi: 10.1093/database/bax020.

Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation

Affiliations

Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation

Magali Ruffier et al. Database (Oxford). .

Abstract

The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ).

Database url: http://www.ensembl.org.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The core assembly schema.
Figure 2.
Figure 2.
The Ensembl web browser can display the differences between a patch region and its equivalent in the primary assembly. Genes that are present in both regions are identified as alt_alleles. (http://e87.ensembl.org/Homo_sapiens/Location/Multi?db=core;g=ENSG00000175164;r=CHR_HG2030_PATCH:133174055-133504218;r1=9:133173980-133504143:1;s1=Homo_sapiens–9).
Figure 3.
Figure 3.
Efficient Searching of genomic features To find all features between coordinates s and e (i.e. C, D and E) in the situation were the maximum length of a feature for this coordinate system is m (i.e. the length of feature F), we extract all features whose start lies between s – m and e, then exclude B, since it ends before s.
Figure 4.
Figure 4.
This ID History Map for the SCARN4 gene (http://e87.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000281516) aligns Ensembl release numbers, genomic assembly versions, and version numbers of that gene across multiple Ensembl IDs. The different updates in the version ID are represented as a chain of small nodes, connected by lines. The colour of the line reflects how well consecutive versions match, for recent releases. If a score was not calculated (typically in older releases), the line is grey.
Figure 5.
Figure 5.
Sequences from external sources are aligned against Ensembl features. For ENST00000315596.14 (http://e87.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000083642;r=13:1-50000000;t=ENST00000315596), a number of predicted RefSeq peptide sequences have been aligned with small mismatches. The curated RefSeq peptide NP_055847 aligns perfectly but the corresponding mRNA sequence, NM_015032.3, does not, indicating that there is a difference in the UTR sequence of this transcript.

Similar articles

  • Ensembl 2021.
    Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugán JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P. Howe KL, et al. Nucleic Acids Res. 2021 Jan 8;49(D1):D884-D891. doi: 10.1093/nar/gkaa942. Nucleic Acids Res. 2021. PMID: 33137190 Free PMC article.
  • Ensembl 2015.
    Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Kähäri AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P. Cunningham F, et al. Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28. Nucleic Acids Res. 2015. PMID: 25352552 Free PMC article.
  • Using the Ensembl genome server to browse genomic sequence data.
    Fernández-Suárez XM, Schuster MK. Fernández-Suárez XM, et al. Curr Protoc Bioinformatics. 2007 Jan;Chapter 1:Unit 1.15. doi: 10.1002/0471250953.bi0115s16. Curr Protoc Bioinformatics. 2007. PMID: 18428779
  • GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species.
    Challis RJ, Kumar S, Stevens L, Blaxter M. Challis RJ, et al. Database (Oxford). 2017 Jan 1;2017:bax039. doi: 10.1093/database/bax039. Database (Oxford). 2017. PMID: 28605774 Free PMC article.
  • UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation.
    Jackman SD, Bohlmann J, Birol İ. Jackman SD, et al. PLoS One. 2015 May 28;10(5):e0128026. doi: 10.1371/journal.pone.0128026. eCollection 2015. PLoS One. 2015. PMID: 26020645 Free PMC article.

Cited by

  • Short and Long Noncoding RNAs Regulate the Epigenetic Status of Cells.
    Uchida S, Bolli R. Uchida S, et al. Antioxid Redox Signal. 2018 Sep 20;29(9):832-845. doi: 10.1089/ars.2017.7262. Epub 2017 Sep 28. Antioxid Redox Signal. 2018. PMID: 28847161 Free PMC article. Review.
  • Ensembl 2020.
    Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Marugán JC, Cummins C, Davidson C, Dodiya K, Fatima R, Gall A, Giron CG, Gil L, Grego T, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, Kay M, Lavidas I, Le T, Lemos D, Martinez JG, Maurel T, McDowall M, McMahon A, Mohanan S, Moore B, Nuhn M, Oheh DN, Parker A, Parton A, Patricio M, Sakthivel MP, Abdul Salam AI, Schmitt BM, Schuilenburg H, Sheppard D, Sycheva M, Szuba M, Taylor K, Thormann A, Threadgold G, Vullo A, Walts B, Winterbottom A, Zadissa A, Chakiachvili M, Flint B, Frankish A, Hunt SE, IIsley G, Kostadima M, Langridge N, Loveland JE, Martin FJ, Morales J, Mudge JM, Muffato M, Perry E, Ruffier M, Trevanion SJ, Cunningham F, Howe KL, Zerbino DR, Flicek P. Yates AD, et al. Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688. doi: 10.1093/nar/gkz966. Nucleic Acids Res. 2020. PMID: 31691826 Free PMC article.
  • A deep ensemble framework for human essential gene prediction by integrating multi-omics data.
    Zhang X, Xiao W, Cochran B, Xiao W. Zhang X, et al. Sci Rep. 2025 Jul 21;15(1):26407. doi: 10.1038/s41598-025-99164-9. Sci Rep. 2025. PMID: 40691502 Free PMC article.
  • RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Genomes.
    Liu W, Wu S, Lin Q, Gao S, Ding F, Zhang X, Aljohi HA, Yu J, Hu S. Liu W, et al. Genomics Proteomics Bioinformatics. 2018 Oct;16(5):373-381. doi: 10.1016/j.gpb.2018.03.006. Epub 2018 Dec 21. Genomics Proteomics Bioinformatics. 2018. PMID: 30583062 Free PMC article.
  • CADD: predicting the deleteriousness of variants throughout the human genome.
    Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. Rentzsch P, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894. doi: 10.1093/nar/gky1016. Nucleic Acids Res. 2019. PMID: 30371827 Free PMC article.

References

    1. International Human Genome Sequencing Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Rios D., McLaren W.M., Chen Y.. et al. (2010) A database and API for variation, dense genotyping and resequencing data. BMC Bioinformatics, 11, 238.. - PMC - PubMed
    1. Herrero J., Muffato M., Beal K.. et al. (2016) Ensembl comparative genomics resources. Database (Oxford), 2016, bav096.. - PMC - PubMed
    1. Zerbino D.R., Johnson N., Juetteman T.. et al. (2016) Ensembl regulation resources. Database (Oxford), 2016, bav119.. - PMC - PubMed
    1. Chen Y., Cunningham F., Rios D.. et al. (2010) Ensembl Variation Resources. BMC Genomics, 11, 293.. - PMC - PubMed

Publication types

LinkOut - more resources