Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 24:7:13637.
doi: 10.1038/ncomms13637.

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes

Affiliations

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes

Yun Sung Cho et al. Nat Commun. .

Erratum in

Abstract

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Schematic overview of KOREF assembly procedure.
(a) Short and long insert size libraries by Illumina whole-genome sequencing strategy. (b) Contig assembly using K-mers from short insert size libraries. (c) Scaffold assembly using long insert size libraries. (d) Super-scaffold assembly using OpGen whole-genome mapping approach. (e) Gap closing using PacBio long reads and Illumina TSLR. (f) Assembly assessment using BioNano consensus maps. (g) Chromosome sequence building using whole-genome alignment information into the human reference (GRCh38). (h) Common variants substitution using 40 Korean whole-genome sequences.
Figure 2
Figure 2. SVs among human assemblies.
(a) The correlation between N50 length of fragments (scaffolds or contigs) and fraction of novel SVs. (b) The correlation between N50 length of fragments and fraction of SVs shared with the CHM1 PacBio read mapping method. (c) Exclusively shared SVs among human assembly sets. SVs shared (reciprocally 50% covered) by only denoted assemblies were considered in this figure. (d) An example of SV that was shared by nine human assemblies. Grey regions denote structural differences shared among all the assemblies, and horizontal lines indicate homologous sequence regions.
Figure 3
Figure 3. Variants difference depending on the reference genome.
Variants (SNVs and small indels) numbers within the regions shared by KOREFs, GRCh38 and GRCh38_C were compared using whole-genome re-sequencing data from three different ethnic groups (Africans: Mandenka, Yoruba, San, Mbuti and Dinka; Caucasians: Sardinian, French and three CEPH/Utah (CEU); East-Asians: Mongolian, two Chinese, two Japanese and five Koreans). (a) Number of homozygous SNVs. (b) Number of homozygous small indels. (c) Number of heterozygous SNVs. (d) Number of heterozygous small indels. (e) The number of variants (referenced by GRCh38 and KOREF_C) at different levels of sharedness. (f) The number of reference-specific variants at different levels of sharedness.

Similar articles

Cited by

References

    1. Reich D. et al.. Reduced neutrophil count in people of African descent is due to a regulatory variant in the Duffy antigen receptor for chemokines gene. PLoS Genet. 5, e1000360 (2009). - PMC - PubMed
    1. Green R. E. et al.. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). - PMC - PubMed
    1. Sheehan S., Harris K. & Song Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194, 647–662 (2013). - PMC - PubMed
    1. Schiffels S. & Durbin R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014). - PMC - PubMed
    1. Dewey F. E. et al.. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet. 7, e1002280 (2011). - PMC - PubMed

Publication types