Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;39(3):309-312.
doi: 10.1038/s41587-020-0711-0. Epub 2020 Dec 7.

Chromosome-scale, haplotype-resolved assembly of human genomes

Affiliations

Chromosome-scale, haplotype-resolved assembly of human genomes

Shilpa Garg et al. Nat Biotechnol. 2021 Mar.

Abstract

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.

PubMed Disclaimer

Conflict of interest statement

F.J.S. obtained a Pacbio SMRT grant in 2019 and had multiple travels sponsored by Pacific Biosciences and Oxford Nanopore Technologies. E.H. and P.P. are employees of Pacific Biosciences. C.-S.C. and A.F. are employees of DNAnexus. A.S., X.Z. and S.M. are employees of Arima Genomics. J.G. and J.M. are employees of Dovetail Genomics. A.C. is an employee of Google. H.L. is a consultant of Integrated DNA Technologies, Inc. and on the Scientific Advisory Boards of Sentieon, Inc., BGI and OrigiMed. G.M.C. is a cofounder of Editas Medicine and has other financial interests, listed at http://arep.med.harvard.edu/gmc/tech.html.

Figures

Fig. 1
Fig. 1. Outline of the phased assembly algorithm, DipAsm.
Assemble HiFi reads into unphased contigs using Peregrine (1); group and order contigs into scaffolds with Hi-C data using HiRise/3D-DNA (3D de novo assembly) (2); map HiFi reads to scaffolds and call heterozygous SNPs using DeepVariant (3); phase heterozygous SNP calls with both HiFi and Hi-C data using WhatsHap plus HapCUT2 (4); partition reads based on their phase using WhatsHap (5); assemble partitioned reads into phased contigs using Peregrine (6).
Fig. 2
Fig. 2. Applications of phased assemblies.
a, Local sequence divergence in comparison to the reference HLA haplotypes (top) and to the KIR haplotypes (bottom) regions in GRCh38. b, SV density (per 100 kb) on chromosome 1 for HG002 (inner), NA12878 (middle) and PGP1 (outer).

References

    1. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nat. Rev. Genet. 2011;12:215–223. doi: 10.1038/nrg2950. - DOI - PMC - PubMed
    1. Vinson JP, et al. Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res. 2005;15:1127–1135. doi: 10.1101/gr.3722605. - DOI - PMC - PubMed
    1. Chin C-S, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. - DOI - PMC - PubMed
    1. Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB. Direct determination of diploid genome sequences. Genome Res. 2017;27:757–767. doi: 10.1101/gr.214874.116. - DOI - PMC - PubMed
    1. Garg S, et al. A graph-based approach to diploid genome assembly. Bioinformatics. 2018;34:i105–i114. doi: 10.1093/bioinformatics/bty279. - DOI - PMC - PubMed

Publication types