Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec;46(12):1343-9.
doi: 10.1038/ng.3119. Epub 2014 Oct 19.

Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing

Affiliations

Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing

Sasan Amini et al. Nat Genet. 2014 Dec.

Abstract

Haplotype-resolved genome sequencing enables the accurate interpretation of medically relevant genetic variation, deep inferences regarding population history and non-invasive prediction of fetal genomes. We describe an approach for genome-wide haplotyping based on contiguity-preserving transposition (CPT-seq) and combinatorial indexing. Tn5 transposition is used to modify DNA with adaptor and index sequences while preserving contiguity. After DNA dilution and compartmentalization, the transposase is removed, resolving the DNA into individually indexed libraries. The libraries in each compartment, enriched for neighboring genomic elements, are further indexed via PCR. Combinatorial 96-plex indexing at both the transposition and PCR stage enables the construction of phased synthetic reads from each of the nearly 10,000 'virtual compartments'. We demonstrate the feasibility of this method by assembling >95% of the heterozygous variants in a human genome into long, accurate haplotype blocks (N50 = 1.4-2.3 Mb). The rapid, scalable and cost-effective workflow could enable haplotype resolution to become routine in human genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Tn5 transposase maintains contiguity of target DNA post-transposition.(a) PAGE-analysis of transposase contiguity: Tn5 transposome was used to target a ~1kb PCR amplicon. Transposed DNA was either treated with SDS to remove the transposase enzyme (lane 1), or as a control without SDS treatment (lane 2). Lane 3 is the input DNA and lane 4 is a 100bp reference ladder. As shown here, Tn5 transposase enzyme stays bound to its substrate DNA post-transposition and the protein-DNA complex only dissociates after addition of the protein denaturing agent, i.e., SDS. (b) Single molecule imaging of Tn5transposed DNA: HMW DNA (see Online methods) labeled with YOYO-1 fluorescent dye was subjected to Tn5 transposition. SDS samples were treated with a final 0.05% SDS concentration and incubated at 55°C for 15 min.
Figure 2
Figure 2
Overview of the CPT-Seq workflow. There are three key steps: (I) indexed transposition, (II) pooling, diluting and compartmentalization, and (III) indexed PCR. A set of 96 different indexed transposome complexes are used to set up 96 independent transposition reactions to create separate genomic virtual partitions (step I). Transposition reactions are pooled together, diluted to sub-haploid DNA content, and split to 96 compartments (step II). Upon removal of the transposase with SDS, compartment-specific libraries are generated using indexed PCR (step III). All samples are pooled together after PCR, and prepared for sequencing.
Figure 3
Figure 3
Demonstration of haplotype read “islands”. Coverage plots are shown for three representative indexes across part of chromosome 22. Reads from the same contiguous molecule display as”islands” of read clusters across the genome (middle panel), or as one mode of a bimodal distribution from a nearest neighbor plot of mapped reads (bottom panel, a representative distance plot from one index). Grey and black regions in the middle panel represent regions of the chromosome that are, respectively, absent or present in a given PCR compartment. Only the black regions, i.e., haplotyping islands, are covered by sequencing reads that carry the index for that given physical compartment. Aligned reads are sorted based on their genomic coordinates and the distance between neighboring alignments from the same partition is recorded. A bimodal distribution is observed with grey regions represented by the distal, i.e., inter-island, subpopulation and the black regions or islands by the proximal, i.e., intra-island subpopulation. Breaks between the islands imply that two neighboring islands do not necessarily belong to the same haplotype. A high ratio of the intra-island to the inter-island peak indicates strong enrichment of the proximal regions of the genome that are in the same haplotype-phase. Representative intra-island coverage is shown in the top panel.
Figure 4
Figure 4
a Phasing yield. Probability that heterozygous SNP pairs are on the same phasing block as a function of distance between them. b Phasing accuracy. For all pairs that are on the same phasing block, the probability that a pair is phased correctly is plotted as a function of distance.

References

    1. Bansal V, et al. The next phase in human genetics. Nat. Biotechnol. 2011;29(1):38–39. - PubMed
    1. Tewhey R, et al. The importance of phase information for human genomics. Nat Rev Genet. 2011;12(3):215–223. - PMC - PubMed
    1. Fan HC, et al. Non-invasive prenatal measurement of the fetal genome. Nature. 2012;487(7407):320–324. - PMC - PubMed
    1. Kitzman JO, et al. Noninvasive whole-genome sequencing of a human fetus. Sci Transl Med. 2012;4(137) 137ra76. - PMC - PubMed
    1. Sabeti PC, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419(6909):832–837. - PubMed

Publication types

Associated data