Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan;41(Database issue):D936-41.
doi: 10.1093/nar/gks1213. Epub 2012 Nov 27.

DbVar and DGVa: public archives for genomic structural variation

Affiliations

DbVar and DGVa: public archives for genomic structural variation

Ilkka Lappalainen et al. Nucleic Acids Res. 2013 Jan.

Abstract

Much has changed in the last two years at DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar). We are now processing direct submissions rather than only curating data from the literature and our joint study catalog includes data from over 100 studies in 11 organisms. Studies from human dominate with data from control and case populations, tumor samples as well as three large curated studies derived from multiple sources. During the processing of these data, we have made improvements to our data model, submission process and data representation. Additionally, we have made significant improvements in providing access to these data via web and FTP interfaces.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The data growth since DGVa and dbVar services was launched. The graph shows accumulation of variant calls, stratified by organism. Several large datasets such as the 1000 Genomes project pilot (estd59) and phase I (estd199), structural variation data from 17 in-bred mouse strains (estd118) and the first releases of somatic structural variation from the COSMIC database (estd192), case-control and case-only studies on developmental delay (nstd54) and the International Standard Cytogenetic Array (ISCA) consortium data (nstd37). In addition to human and mouse data the archives include data from dog, pig, fruit fly, macaque, cow, horse, zebrafish, sorghum and chimp.
Figure 2.
Figure 2.
Graphical representation of the archive data model. The three accessioned objects (studies, calls and regions) are prefixed by an ‘n’ if submitted to dbVar and an ‘e’ if submitted to DGVa. Variation in individual sample genomes is aggregated to a variant region, with respect to a reference genome. Genomic position (indicated by green arrows) does not necessarily overlap completely. Study authors describe the aggregation process in the Assertion method attribute. Discovery and validation methods for each call are stored in the Experiment attribute. This facilitates cross-study analysis of GSV identified using different techniques. Studies point to any external resources that provide access to the raw data used in the experiment or to the publication describing the data.
Figure 3.
Figure 3.
Rendering of breakpoint ambiguity (A) is shown. Variants with breakpoint resolution are shown with fully saturated color. Breakpoints defining by a range (using inner/outer starts and stops) are shown as fully saturated for the high confidence intervals (the regions defined by the inner start-stop) while the region of breakpoint ambiguity is shown as transparent. In many cases, an undefined breakpoint is submitted, but no likelihood range is provided; in these cases triangles pointing towards each other (when only outer coordinates are provided) or pointing out (when inner coordinates are provided). Rendering call and region type (B) is usually designated by color. SV corresponds to variant region and SSV corresponds to variant calls.

Similar articles

Cited by

References

    1. Church DM, Lappalainen I, Sneddon TP, Hinton J, Maguire M, Lopez J, Garner J, Paschall J, DiCuccio M, Yaschenko E, et al. Public data archives for genomic structural variation. Nat. Genet. 2010;42:813–814. - PMC - PubMed
    1. She X, Cheng Z, Zöllner S, Church DM, Eichler EE. Mouse segmental duplication and copy number variation. Nat. Genet. 2008;40:909–914. - PMC - PubMed
    1. Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, et al. Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 2012;22:778–790. - PMC - PubMed
    1. Zheng L-Y, Guo X-S, He B, Sun L-J, Peng Y, Dong S-S, Liu T-F, Jiang S, Ramachandran S, Liu C-M, et al. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor) Genome Biol. 2011;12:R114. - PMC - PubMed
    1. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 2011;12:363–376. - PMC - PubMed

Publication types