Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 5:15:265.
doi: 10.1186/1471-2164-15-265.

VCGDB: a dynamic genome database of the Chinese population

Affiliations

VCGDB: a dynamic genome database of the Chinese population

Yunchao Ling et al. BMC Genomics. .

Abstract

Background: The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies.

Description: We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software.

Conclusions: VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data processing workflow used to construct the virtual Chinese genome database (VCGDB).
Figure 2
Figure 2
Statistical analysis of the dynamic genomics information in the virtual Chinese genome database (VCGDB). A. Dynamic positions and indel distribution in the CHN, CHB, and CHS populations. The X-axis shows the major base probability of the dynamic position/probability of indels in the genome sequences. The Y-axis shows the proportion of dynamic positions/indels with the specific probability region. B. Indel length distribution in the CHN, CHB, and CHS populations. The X-axis shows the length of insertions (blue) and deletions (red), and the Y-axis shows the number of indels. Only high-probability insertions and deletions (>50%) in VCGDB were counted. C. Distribution of dynamic positions, indels, and MAIR (major allele and indel positions against the GRCh37 reference genome) based on the annotation information.
Figure 3
Figure 3
Differences between the two Han Chinese CHS and CHB populations. A. Venn diagram of a comparison of the dynamic positions. B. Venn diagram of a comparison of major alleles against the GRCh37 reference genome. C. Venn diagram of a comparison of high-probability indels against the GRCh37 reference genome. D. Venn diagram of a comparison of rare variations. In B and C, some shared dynamic positions were substituted by the same nucleotides/indels, others were substituted by different nucleotides/indels; these are marked "same" and "diff", respectively.
Figure 4
Figure 4
VCGBrowser interface (A) and VCGDB online search page (B).

Similar articles

Cited by

References

    1. Genomes Project C. A map of human genome variation from population-scale sequencing. Nature. 2010;15:1061–1073. doi: 10.1038/nature09534. - DOI - PMC - PubMed
    1. Genomes Project C. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;15:56–65. doi: 10.1038/nature11632. - DOI - PMC - PubMed
    1. Genetic Analysis of Psoriasis C. Strange A, Capon F, Spencer CC, Knight J, Weale ME, Allen MH, Barton A, Band G, Bellenguez C, Bergboer JG, Blackwell JM, Bramon E, Bumpstead SJ, Casas JP, Cork MJ, Corvin A, Deloukas P, Dilthey A, Duncanson A, Edkins S, Estivill X, Fitzgerald O, Freeman C, Giardina E, Gray E, Hofer A, Hüffmeier U, Hunt SE. et al.A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat Genet. 2010;15:985–990. doi: 10.1038/ng.694. - DOI - PMC - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;15:9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research N. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;15:609–615. doi: 10.1038/nature10166. - DOI - PMC - PubMed

Publication types

LinkOut - more resources