Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 11:11:293.
doi: 10.1186/1471-2164-11-293.

Ensembl variation resources

Affiliations

Ensembl variation resources

Yuan Chen et al. BMC Genomics. .

Abstract

Background: The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics.

Description: The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl.

Conclusions: Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Variation summary for rs2476601. Label 1 indicates the links to information for this specific variant. Label 2 indicates the summary, which includes the source (dbSNP), synonyms or other IDs for this variant in other databases or array platforms, alleles found and genomic location. Flanking sequence is also found in the summary information but not shown in the figure. http://Mar2010.archive.ensembl.org/Homo_sapiens/Variation/Summary?source=dbSNP;v=rs2476601.
Figure 2
Figure 2
LD (linkage disequilibrium) plot for the region around rs2476601. LD values in this figure were calculated based on allele frequencies in the CEPH human population. The LD data between variants is represented using a triangular grid shaded on a gradient from white to red depending on the strength of the LD (where red is high LD, white is low). Hovering the mouse cursor over one of the coloured regions in the plot reveals a pop-up box displaying the two variation IDs for that coloured region, and the LD value between them. http://Mar2010.archive.ensembl.org/Homo_sapiens/Location/LD?focus=variation;pop1=CSHL-HAPMAP:HapMap-CEU;r=1:114367568-114387567;v=rs2476601;vf=1916990.
Figure 3
Figure 3
Variation views. Views available in the variation tab are titled with the name of the link leading to each set of information. The data in the figure correspond to rs2476601. The gene/transcript panel shows an Ensembl gene with this variant where it has different effects depending on the splice variant. Allele frequencies have been measured in four HapMap populations, as shown in the population genetics panel. The NHGRI associates Crohn's disease, Rheumatoid Arthritis and Type I Diabetes to this SNP, displayed in phenotype data. The phylogenetic context panel shows a common allele at this position in the reference sequences for chimpanzee (Pan troglodytes), orangutan (Pongo pygmaeus), macaque (Macaca mulatta), mouse (Mus musculus), rat (Rattus norvegicus), horse (Equus caballus), dog (Canis familiaris), cow (Bos taurus), opossum (Monodelphis domestica), and chicken (Gallus gallus). Platypus (Ornithorhynchus anatinus) shows a different nucleotide (T).
Figure 4
Figure 4
Variation image. A. The variation image in the gene tab shows all transcripts for an Ensembl gene in red, and variations mapping to each transcript as coloured boxes. Intronic variants are, by default, only drawn if within 100 bp of an exon. Change this by clicking on configure this page and selecting full introns. Protein domains and motifs encoded by the transcript sequence from databases such as PIR-Superfamily, PROSITE and Pfam are drawn in purple along each transcript. B. An expanded view of the box drawn in A is shown. Label 1 indicates the exon structure of Ensembl transcript ENST00000359785. Filled boxes are coding sequence within the exons, and unfilled boxes indicate untranslated regions (UTRs). Variations are indicated by label 2, and are colour-coded as to their effect on the transcript (see legend on the page). For this transcript, many intronic (dark blue) variations are shown, along with four non-synonymous (yellow) SNPs that display the potential amino acids at each position. Label 3 indicates a PIR-Superfamily domain (purple) that maps to the protein sequence. http://Mar2010.archive.ensembl.org/Homo_sapiens/Gene/Variation_Gene?db=core;g=ENSG00000134242;r=1:114356437-114414375;t=ENST00000359785.
Figure 5
Figure 5
Population comparison image. The ENST00000359785 transcript is shown, along with a representation of the genomes of Craig Venter and James Watson and the reference sequence (GRCh37) in this region. The left hand side of the view shows links available from the transcript tab. Label 1: The genomes of Craig Venter and James Watson have high coverage in this region (i.e. at least two sequencing reads or present for both strands of the chromosome), shown by a thick grey bar. Label 2: Variants are drawn as boxes if the allele differs from the reference sequence. A non-synonymous variation is shown in yellow, with two potential amino acids indicated (tryptophan and arginine). Label 3: The nucleotides possible for this variation are displayed upon clicking the corresponding yellow box. Label 4: A table compares each allele at positions of variation. This view indicates the alleles in both Venter and Watson's genomes are different to the reference sequence allele, for this variation. Watson is heterozygous at this position, and Venter is homozygous. http://Mar2010.archive.ensembl.org/Homo_sapiens/Transcript/Population/Image?db=core;g=ENSG00000134242;r=1:114356437-114414375;t=ENST00000359785.
Figure 6
Figure 6
The cDNA sequence for ENST00000359785. A. The untranslated region (UTR) is highlighted in yellow. Numbering of the first line begins at 1 at the start of the UTR. Numbering of the second line starts at 1 at the beginning of the coding sequence. The third line shows line numbering and sequence of the protein. Codons are indicated by light yellow alternating with no highlight. B. Further down the sequence shown in A, a variation is highlighted. This variation is at position 183 in the protein sequence, 547 in the coding sequence, and 677 in the transcript sequence is shown to be rs34590413. The ID is revealed upon clicking on the IUPAC or ambiguity code above the highlighted nucleotide, which provides a link to the variation tab. The non-synonymous variations are shown by a red-coloured amino acid. Hovering over the indicated amino acid with the mouse shows the possible amino acids at that position. http://Mar2010.archive.ensembl.org/Homo_sapiens/Transcript/Sequence_cDNA?db=core;g=ENSG00000134242;r=1:114356437-114414375;t=ENST00000359785.
Figure 7
Figure 7
Sequence alignments between the human reference sequence, GRCh37, and the genomes of Venter and Watson. Variation data are highlighted in green and indicated by an IUPAC code. A. The default display replaces nucleotides with dots if the genome has the same allele as the reference assembly. B. The display can be changed by using the matching basepairs: show all option in the configure this page link. http://Mar2010.archive.ensembl.org/Homo_sapiens/Location/SequenceAlignment?db=core;g=ENSG00000134242;r=1:114382907-114387906;t=ENST00000359785.

Similar articles

  • Touring Ensembl: a practical guide to genome browsing.
    Spudich GM, Fernández-Suárez XM. Spudich GM, et al. BMC Genomics. 2010 May 11;11:295. doi: 10.1186/1471-2164-11-295. BMC Genomics. 2010. PMID: 20459808 Free PMC article.
  • A database and API for variation, dense genotyping and resequencing data.
    Rios D, McLaren WM, Chen Y, Birney E, Stabenau A, Flicek P, Cunningham F. Rios D, et al. BMC Bioinformatics. 2010 May 11;11:238. doi: 10.1186/1471-2105-11-238. BMC Bioinformatics. 2010. PMID: 20459810 Free PMC article.
  • Ensembl Genomes 2020-enabling non-vertebrate genomic research.
    Howe KL, Contreras-Moreira B, De Silva N, Maslen G, Akanni W, Allen J, Alvarez-Jarreta J, Barba M, Bolser DM, Cambell L, Carbajo M, Chakiachvili M, Christensen M, Cummins C, Cuzick A, Davis P, Fexova S, Gall A, George N, Gil L, Gupta P, Hammond-Kosack KE, Haskell E, Hunt SE, Jaiswal P, Janacek SH, Kersey PJ, Langridge N, Maheswari U, Maurel T, McDowall MD, Moore B, Muffato M, Naamati G, Naithani S, Olson A, Papatheodorou I, Patricio M, Paulini M, Pedro H, Perry E, Preece J, Rosello M, Russell M, Sitnik V, Staines DM, Stein J, Tello-Ruiz MK, Trevanion SJ, Urban M, Wei S, Ware D, Williams G, Yates AD, Flicek P. Howe KL, et al. Nucleic Acids Res. 2020 Jan 8;48(D1):D689-D695. doi: 10.1093/nar/gkz890. Nucleic Acids Res. 2020. PMID: 31598706 Free PMC article.
  • Genome information resources - developments at Ensembl.
    Hammond MP, Birney E. Hammond MP, et al. Trends Genet. 2004 Jun;20(6):268-72. doi: 10.1016/j.tig.2004.04.002. Trends Genet. 2004. PMID: 15145580 Review.
  • Bioinformatics for personal genome interpretation.
    Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Capriotti E, et al. Brief Bioinform. 2012 Jul;13(4):495-512. doi: 10.1093/bib/bbr070. Epub 2012 Jan 13. Brief Bioinform. 2012. PMID: 22247263 Free PMC article. Review.

Cited by

  • Dissecting the Shared and Context-Dependent Pathways Mediated by the p140Cap Adaptor Protein in Cancer and in Neurons.
    Chapelle J, Sorokina O, McLean C, Salemme V, Alfieri A, Angelini C, Morellato A, Adrait A, Menna E, Matteoli M, Couté Y, Ala U, Turco E, Defilippi P, Armstrong JD. Chapelle J, et al. Front Cell Dev Biol. 2019 Oct 15;7:222. doi: 10.3389/fcell.2019.00222. eCollection 2019. Front Cell Dev Biol. 2019. PMID: 31681758 Free PMC article.
  • Ensembl variation resources.
    Hunt SE, McLaren W, Gil L, Thormann A, Schuilenburg H, Sheppard D, Parton A, Armean IM, Trevanion SJ, Flicek P, Cunningham F. Hunt SE, et al. Database (Oxford). 2018 Jan 1;2018:bay119. doi: 10.1093/database/bay119. Database (Oxford). 2018. PMID: 30576484 Free PMC article.
  • Ensembl 2017.
    Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Juettemann T, Keenan S, Laird MR, Lavidas I, Maurel T, McLaren W, Moore B, Murphy DN, Nag R, Newman V, Nuhn M, Ong CK, Parker A, Patricio M, Riat HS, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Wilder SP, Zadissa A, Kostadima M, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Cunningham F, Yates A, Zerbino DR, Flicek P. Aken BL, et al. Nucleic Acids Res. 2017 Jan 4;45(D1):D635-D642. doi: 10.1093/nar/gkw1104. Epub 2016 Nov 28. Nucleic Acids Res. 2017. PMID: 27899575 Free PMC article.
  • Cytoplasmic accumulation of NCoR in malignant melanoma: consequences of altered gene repression and prognostic significance.
    Gallardo F, Padrón A, Garcia-Carbonell R, Rius C, González-Perez A, Arumí-Uria M, Iglesias M, Nonell L, Bellosillo B, Segura S, Pujol RM, Lopez-Bigas N, Bertran J, Bigas A, Espinosa L. Gallardo F, et al. Oncotarget. 2015 Apr 20;6(11):9284-94. doi: 10.18632/oncotarget.3252. Oncotarget. 2015. PMID: 25823659 Free PMC article.
  • Tissue-specific usage of transposable element-derived promoters in mouse development.
    Miao B, Fu S, Lyu C, Gontarz P, Wang T, Zhang B. Miao B, et al. Genome Biol. 2020 Sep 28;21(1):255. doi: 10.1186/s13059-020-02164-3. Genome Biol. 2020. PMID: 32988383 Free PMC article.

References

    1. International HapMap Consortium. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Yakub I, Birren BW, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. - DOI - PMC - PubMed
    1. The 1000 Genomes Project. http://www.1000genomes.org
    1. Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Shue BC, Wang A, Wang J, Wang X, Yan X, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, Baldwin-Pitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YH, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Tint NN, Tsonis C, Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC, Pfannkoch CM, Barnstead M, Stephenson LD. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science. 2002;296:1661–1671. doi: 10.1126/science.1069193. - DOI - PubMed
    1. Cunningham F, Rios D, Griffiths M, Smith J, Ning Z, Cox T, Flicek P, Marin-Garcin P, Herrero J, Rogers J, Weyden L van der, Bradley A, Birney E, Adams DJ. TranscriptSNPView: a genome-wide catalog of mouse coding variation. Nat Genet. 2006;38:853. doi: 10.1038/ng0806-853a. - DOI - PMC - PubMed
    1. STAR Consortium. Saar K, Beck A, Bihoreau MT, Birney E, Brocklebank D, Chen Y, Cuppen E, Demonchy S, Dopazo J, Flicek P, Foglio M, Fujiyama A, Gut IG, Gauguier D, Guigo R, Guryev V, Heinig M, Hummel O, Jahn N, Klages S, Kren V, Kube M, Kuhl H, Kuramoto T, Kuroki Y, Lechner D, Lee YA, Lopez-Bigas N, Lathrop GM, Mashimo T, Medina I, Mott R, Patone G, Perrier-Cornet JA, Platzer M, Pravenec M, Reinhardt R, Sakaki Y, Schilhabel M, Schulz H, Serikawa T, Shikhagaie M, Tatsumoto S, Taudien S, Toyoda A, Voigt B, Zelenika D, Zimdahl H, Hubner N. SNP and haplotype mapping for genetic analysis in the rat. Nat Genet. 2008;40:560–566. doi: 10.1038/ng.124. - DOI - PMC - PubMed

Publication types