Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;29(1):59-63.
doi: 10.1038/nbt.1740. Epub 2010 Dec 19.

Haplotype-resolved genome sequencing of a Gujarati Indian individual

Affiliations

Haplotype-resolved genome sequencing of a Gujarati Indian individual

Jacob O Kitzman et al. Nat Biotechnol. 2011 Jan.

Erratum in

  • Nat Biotechnol. 2011 May;29(5):459

Abstract

Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/.

Figures

Figure 1
Figure 1
Haplotype-resolved genome sequencing. (a,b) A single, highly complex fosmid library was constructed (a) and split into 115 pools (b), each representing ~3% physical coverage of the diploid human genome. Barcoded shotgun libraries from each pool were constructed, then combined and sequenced. As expected, reads from each library map to ~5,000 × ~37 kbp blocks, minimally redundant within each library. (c) Whole-genome shotgun sequencing of the same individual generated unphased variant calls. (d) Unphased variant calls were combined with haploid genotype calls to assemble haplotype blocks using a maximum parsimony approach (reference allele in black, nonreference allele in red).
Figure 2
Figure 2
Haplotype assembly results. (a) Size distribution of blocks within the haplotype assembly up to a maximum block size of 2.79 Mbp. Half of the assembly comprised blocks longer than 386 kbp (N50). (b) Comparison of experimental phasing with HapMap population-based inference for NA20847, with agreement of pairwise haplotype predictions as a function of physical distance and linkage disequilibrium. (c) Agreement of pairwise haplotype predictions as a function of physical distance and minor allele frequency (defined as the lower allele frequency of the pair in GIH). Key is the same as for b.
Figure 3
Figure 3
Enrichment of novel variants on ‘GIH-like’ haplotypes. (a) Haplotypes were scored and rank ordered within sliding windows of 20 HapMap variants for greater similarity to GIH or CEU on the basis of population allele frequencies (left on x axis: more similar to GIH). Plotted is the fraction of novel variants (not in dbSNP v130) in rank-ordered groups of haplotype windows, demonstrating that the most ‘GIH-like’ haplotype windows are enriched for novel variants. Values from trio-phased CEU individual NA12891 are shown for comparison (red). (b) Scores calculated in a for haplotype windows were compared between homologous chromosomes, and haplotypes were ranked based on the extent to which they scored as ‘GIH-like’ relative to their homolog. Plotted is the fraction of novel variants found on the more ‘GIH-like’ haplotype in rank-ordered groups of homologous haplotype windows. As above, the analysis was also performed for individual NA12891 using the rank ordering from individual NA20847. Haplotype blocks that are most differentiated relative to their homolog (higher ranked) with respect to GIH versus CEU similarity are enriched for novel variants relative to their homolog, consistent with the pattern observed in a.
Figure 4
Figure 4
Insertion anchoring and structural variation detection. (a) Homozygous deletion (top), hemizygous deletion (middle) and inversion (bottom) with fosmid clone support. Deletion calls were made using read depth and paired-read discordance. Inversions were called by paired-read discordance. SNPs within hemizygous deletions appear as stretches of hemizygosity by whole-genome shotgun sequencing. Purple connections indicate the additional support of strand discordance of read pairs spanning genomic DNA and the vector backbone. (b) Novel contigs not present in the reference assembly (red) but detected among clone pool–derived reads (light blue, purple, yellow) are anchored by searching for positions in the reference common to those pools but missing from most or all other pools. This approach anchors 1,733 recently reported insertion sequences, including contig GU268019.

Comment in

  • Genomics: No half measures for haplotypes.
    Muers M. Muers M. Nat Rev Genet. 2011 Feb;12(2):77. doi: 10.1038/nrg2939. Epub 2010 Dec 30. Nat Rev Genet. 2011. PMID: 21191422 No abstract available.
  • The next phase in human genetics.
    Bansal V, Tewhey R, Topol EJ, Schork NJ. Bansal V, et al. Nat Biotechnol. 2011 Jan;29(1):38-9. doi: 10.1038/nbt.1757. Nat Biotechnol. 2011. PMID: 21221098 No abstract available.
  • One genome, two haplotypes.
    Rusk N. Rusk N. Nat Methods. 2011 Feb;8(2):107. doi: 10.1038/nmeth0211-107. Nat Methods. 2011. PMID: 21355116 No abstract available.

References

    1. Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed
    1. International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. - PMC - PubMed
    1. Green RE, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. - PMC - PubMed
    1. Anonymous. Human genome: Genomes by the thousand. Nature. 2010;467:1026–1027. - PubMed
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–1145. - PubMed

Publication types