De novo assembly and phasing of a Korean human genome
- PMID: 27706134
- DOI: 10.1038/nature20098
De novo assembly and phasing of a Korean human genome
Abstract
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
Similar articles
-
De novo assembly of a haplotype-resolved human genome.Nat Biotechnol. 2015 Jun;33(6):617-22. doi: 10.1038/nbt.3200. Epub 2015 May 25. Nat Biotechnol. 2015. PMID: 26006006
-
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11. Methods Mol Biol. 2023. PMID: 36335499 Review.
-
Large indel detection in region-based phased diploid assemblies from linked-reads.BMC Genomics. 2025 Mar 18;26(Suppl 2):263. doi: 10.1186/s12864-025-11398-z. BMC Genomics. 2025. PMID: 40102722 Free PMC article.
-
De novo genome assembly of a Han Chinese male and genome-wide detection of structural variants using Oxford Nanopore sequencing.Mol Genet Genomics. 2020 Jul;295(4):871-876. doi: 10.1007/s00438-020-01672-y. Epub 2020 Apr 9. Mol Genet Genomics. 2020. PMID: 32274588
-
De novo phasing resolves haplotype sequences in complex plant genomes.Plant Biotechnol J. 2022 Jun;20(6):1031-1041. doi: 10.1111/pbi.13815. Epub 2022 Apr 9. Plant Biotechnol J. 2022. PMID: 35332665 Free PMC article. Review.
Cited by
-
Transcriptomic analysis and competing endogenous RNA network in the human endometrium between proliferative and mid-secretory phases.Exp Ther Med. 2021 Jun;21(6):660. doi: 10.3892/etm.2021.10092. Epub 2021 Apr 20. Exp Ther Med. 2021. PMID: 33968190 Free PMC article.
-
Computational methods for chromosome-scale haplotype reconstruction.Genome Biol. 2021 Apr 12;22(1):101. doi: 10.1186/s13059-021-02328-9. Genome Biol. 2021. PMID: 33845884 Free PMC article. Review.
-
Haplotype-resolved Chinese male genome assembly based on high-fidelity sequencing.Fundam Res. 2022 Mar 2;2(6):946-953. doi: 10.1016/j.fmre.2022.02.005. eCollection 2022 Nov. Fundam Res. 2022. PMID: 38933383 Free PMC article.
-
RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding.Comput Struct Biotechnol J. 2019 Nov 7;17:1415-1428. doi: 10.1016/j.csbj.2019.09.009. eCollection 2019. Comput Struct Biotechnol J. 2019. PMID: 31871587 Free PMC article.
-
A Statistical Method for Observing Personal Diploid Methylomes and Transcriptomes with Single-Molecule Real-Time Sequencing.Genes (Basel). 2018 Sep 19;9(9):460. doi: 10.3390/genes9090460. Genes (Basel). 2018. PMID: 30235838 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous