Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;18(2):170-175.
doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

Affiliations

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm

Haoyu Cheng et al. Nat Methods. 2021 Feb.

Abstract

Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Outline of the hifiasm algorithm.
Orange and blue bars represent the reads with heterozygous alleles carrying local phasing information, while green bars come from the homozygous regions without any heterozygous alleles. In phased string graph, a vertex corresponds to the HiFi read with same ID, and an edge between two vertices indicates that their corresponding reads are overlapped with each other. Hifiasm first performs haplotype-aware error correction to correct sequence errors but keep heterozygous alleles, and then builds phased assembly graph with local phasing information from the corrected reads. Only the reads coming from the same haplotype are connected in the phased assembly graph. With complementary data providing global phasing information, hifiasm generates a completely phased assembly for each haplotype from the graph. Hifiasm also can generate unphased primary assembly only with HiFi reads. This unphased primary assembly represents phased blocks (regions) that are resolvable with HiFi reads, but does not preserve phasing information between two phased blocks.
Figure 2:
Figure 2:. Effect of false read binning.
(a) A set of reads with global phasing information provided by the complementary data. Reads in orange and reads in blue are specifically partitioned into haplotype 1 and haplotype 2, respectively. The remaining reads in green are partitioned into both haplotypes. Read 9 without heterozygous alleles is mispartitioned into haplotype 2, instead of to both haplotypes. (b) Pre-binning assemblies produced by current methods which independently assemble two haplotypes. Haplotype 1 is broken into two contigs due to the mispartition of read 9. (c) Hifiasm fixes the mispartition by the local phasing information in the phased assembly graph. It is able to identify that read 9 does not have heterozygous alleles, so that read 9 should be partitioned into both haplotypes.

References

    1. Chin C-S et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013). - PubMed
    1. Berlin K et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol 33, 623–630 (2015). - PubMed
    1. Li H Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016). - PMC - PubMed
    1. Koren S et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017). - PMC - PubMed
    1. Kolmogorov M, Yuan J, Lin Y & Pevzner PA Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol 37, 540–546 (2019). - PubMed

Publication types