Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;40(9):1332-1335.
doi: 10.1038/s41587-022-01261-x. Epub 2022 Mar 24.

Haplotype-resolved assembly of diploid genomes without parental data

Affiliations

Haplotype-resolved assembly of diploid genomes without parental data

Haoyu Cheng et al. Nat Biotechnol. 2022 Sep.

Abstract

Routine haplotype-resolved genome assembly from single samples remains an unresolved problem. Here we describe an algorithm that combines PacBio HiFi reads and Hi-C chromatin interaction data to produce a haplotype-resolved assembly without the sequencing of parents. Applied to human and other vertebrate samples, our algorithm consistently outperforms existing single-sample assembly pipelines and generates assemblies of similar quality to the best pedigree-based assemblies.

PubMed Disclaimer

Figures

Extended Data Fig. 1.
Extended Data Fig. 1.. Chromosome-level phasing results for hifiasm (Hi-C) human assemblies.
All contigs were aligned to the T2T CHM13 reference and the Y chromosome of GRCh38, and then the corresponding regions of contigs on the reference were determined based on the alignment results. For each chromosome, the top track and the bottom track indicate haplotype 1 contigs and haplotype 2 contigs, respectively. The phase density of contigs was calculated by the parental short reads. Gray bars indicate centromeric regions. (a) Chromosome-level phasing results for HG002 with 30X HiFi and 30X Hi-C. (b) Chromosome-level phasing results for HG00733 with 30X HiFi and 30X Hi-C.
Figure 1.
Figure 1.. Haplotype-resolved assembly using Hi-C data.
(a) Assembly workflow. Hifiasm corrects reads and produces a phased assembly graph. It then maps Hi-C short reads to the graph, links unitigs in the assembly graph that share mapped Hi-C fragments, and finds a bipartition of unitigs such that unitigs linked by many Hi-C fragments tend to be grouped together. Hifiasm finally emits a haplotype-resolved assembly jointly considering the unitig partition and the assembly graph. (b) Phasing accuracy of HG002 assemblies. Each point corresponds to a contig. Its coordinate gives the number of paternal- and maternal-specific 31-mers on the contig, with these 31-mers derived from parental short reads. Hifiasm (trio): haplotype-resolved hifiasm assembly with trio binning. Hifiasm (dual): paired hifiasm assembly without Hi-C. Hifiasm (primary/alt): primary and alternate hifiasm assembly without Hi-C. Hifiasm (Hi-C): haplotype-resolved hifiasm assembly with Hi-C. FALCON-Phase (Hi-C): FALCON-Phase assembly with Hi-C based on IPA contigs, acquired from its publication. HiCanu (primary/alt): primary and alternate HiCanu assembly without Hi-C. All assemblies use the same HiFi and Hi-C datasets. (c) Screenshot of contig and read alignment to GRCh38 around gene GTF2IRD2.

References

    1. Logsdon GA, Vollger MR & Eichler EE Long-read human genome sequencing and its applications. Nat. Rev. Genet 21, 597–614 (2020). - PMC - PubMed
    1. Rhie A et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). - PMC - PubMed
    1. Chin C-S et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016). - PMC - PubMed
    1. Nurk S et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 30, 1291–1305 (2020). - PMC - PubMed
    1. Cheng H, Concepcion GT, Feng X, Zhang H & Li H Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021). - PMC - PubMed

Publication types