Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;13(12):1050-1054.
doi: 10.1038/nmeth.4035. Epub 2016 Oct 17.

Phased diploid genome assembly with single-molecule real-time sequencing

Affiliations

Phased diploid genome assembly with single-molecule real-time sequencing

Chen-Shan Chin et al. Nat Methods. 2016 Dec.

Abstract

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

C.-S. C., P.P., G. C., C. D., and D. R. are employees and shareholder of Pacific Biosciences, a company commercializing DNA sequencing technologies.

Figures

Figure 1
Figure 1. FALCON and FALCON-Unzip overview
(a) The initial assembly is computed by FALCON, which error corrects the raw reads (not shown) and then assembles using a string graph of the read overlaps. The assembled contigs are further refined by FALCON-Unzip into the final set of contigs and haplotigs. (b) Phase heterozygous SNPs and group reads by haplotype (c) The phased reads are used to open up the haplotype-fused path and generate as output a set of primary contigs and associated haplotigs.
Figure 2
Figure 2. SNP density and Structural Variations in the FALCON-Unzip F1 Arabidopsis assembly
The plot shows the primary contigs and haplotigs aligned to chromosome 4 of the TAIR reference assembly as grey line segments. Blue and Red colored dots show the number of Col-0 and Cvi-0 specific SNPs, respectively, per 50 kbp region of the assembled contig. The vertical orange lines indicate the centromere locations. The short vertical tick marks above the grey lines indicate the structural variations against Col-0 (blue) and Cvi-0 (red).

References

    1. Goffeau A, et al. Life with 6000 genes. Science. 1996;274:546–567. - PubMed
    1. Myers EW, et al. A whole-genome assembly of Drosophila. Science. 2000;287:2196–2204. - PubMed
    1. Bonfield JK, Smith KF, Staden R. A new DNA sequence assembly program. Nucleic acids research. 1995;23:4992–4999. - PMC - PubMed
    1. Stamatoyannopoulos JA, Guigó Serra R, Djebali S, Lagarde J, Adams LB. An encyclopedia of mouse DNA elements (Mouse ENCODE) 2012. - PMC - PubMed
    1. Celniker SE, et al. Unlocking the secrets of the genome. Nature. 2009;459:927–930. - PMC - PubMed