Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;24(4):688-96.
doi: 10.1101/gr.168450.113. Epub 2014 Jan 13.

Reconstructing complex regions of genomes using long-read sequencing technology

Affiliations

Reconstructing complex regions of genomes using long-read sequencing technology

John Huddleston et al. Genome Res. 2014 Apr.

Abstract

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
17q21.31 genomic target region. (A) Tiling path of eight large-insert BAC clones sequenced and assembled using both PacBio- and Sanger-based approaches. Clones were selected from a haploid complete hydatidiform mole source (CH17). (B) Gene annotation (RefSeq) and segmental duplication organization were obtained from GRCh37 using a custom liftover coordinate conversion tool that accounted for the difference in copy number between the mole haplotype and the reference. (C) Alignment of supercontigs built from the same eight clones using PacBio and Sanger assemblies. Sequence differences (vertical blue lines) and internal duplications (gray) are shown. The two supercontigs are 99.99% identical, excluding a collapsed higher-order repeat at the end of the PacBio assembly of CH17-41F14.
Figure 2.
Figure 2.
Concordant and discordant PacBio assemblies. (A) Alignment between PacBio (top) and Sanger (bottom) assemblies for CH17-227A2 using Miropeats (Parsons 1995) shows virtually no differences. Note the uniform sequence coverage between 200- and 300-fold. Mismatches/indels are indicated by vertical blue lines. (B) Alignment between PacBio and Sanger assemblies for clone CH17-41F14. A spike of increased sequence coverage across the internal repeat and the reduced complexity of the repeat compared with the Sanger assembly clearly define a collapse of a higher-order repeat from 20 to 12 kbp within the PacBio assembly. The uniformity of sequence coverage may be used as one indicator of potential misassembly.
Figure 3.
Figure 3.
Upgrading a chimpanzee genomic region. Sequence and assembly of six large-insert clones (CH251) from two segmental duplication blocks (red and green) are aligned to their corresponding sequences from the 17p11.2 Smith-Magenis region of the chimpanzee reference assembly (panTro4). Clones were sequenced and assembled from the (A) distal and (B) proximal segmental duplication blocks. The PacBio assembly was compared with the corresponding working draft sequences from panTro4. The alignment identity of panTro4 contigs without gap sequence and the PacBio supercontigs is 94.69% over 525 kbp of aligned sequence. Thirty-one percent (241/766 kbp) of the chimpanzee sequence is missing within the working draft assembly. The average sequence identity for phred >30 bp from BAC end sequence (BES) mappings was 99.72% (16,174/16,220 high-quality bases) and 99.98% (156,955/156,991 high-quality bases) from fosmid end sequence (FES) mappings. Gaps in the panTro4 contigs are indicated in red. Gene annotations are shown based on a custom liftover from RefSeq annotations of GRCh37 in the corresponding regions of 17p11.2. The missing sequence corresponds to high-identity segmental duplications (orange bars represent segmental duplications predicted by whole-genome shotgun sequence detection or WSSD). The clone CH251-545A24 was previously sequenced with capillary sequencing (GenBank accession: AC183294).
Figure 4.
Figure 4.
Support for chimpanzee supercontig architecture from clone end mappings. Concordant BES and FES alignments confirm order and orientation of (A) distal and (B) proximal chimpanzee supercontig assemblies. One-hundred-twenty-five paired-end sequences that map with >99.8% sequence identity are depicted. Both analyses support high-quality assembly of these complex regions of the chimpanzee genome.

Comment in

  • Technology: SMRT move?
    Koch L. Koch L. Nat Rev Genet. 2014 Mar;15(3):146. doi: 10.1038/nrg3678. Epub 2014 Feb 4. Nat Rev Genet. 2014. PMID: 24492234 No abstract available.

References

    1. Adey A, Morrison HG, Asan, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, et al. 2010. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 11: R119. - PMC - PubMed
    1. Alkan C, Cardone MF, Catacchio CR, Antonacci F, O’Brien SJ, Ryder OA, Purgato S, Zoli M, Della Valle G, Eichler EE, et al. 2011a. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res 21: 137–145 - PMC - PubMed
    1. Alkan C, Sajjadian S, Eichler EE 2011b. Limitations of next-generation genome sequence assembly. Nat Methods 8: 61–65 - PMC - PubMed
    1. Au KF, Underwood JG, Lee L, Wong WH 2012. Improving PacBio long read accuracy by short read alignment. PLoS ONE 7: e46679. - PMC - PubMed
    1. Burton J, Adey A, Patwardhan RP, Qiu R, Kitzman J, Shendure J 2013. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31: 1119–1125 - PMC - PubMed

Publication types