Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar;32(3):261-266.
doi: 10.1038/nbt.2833. Epub 2014 Feb 23.

Whole-genome haplotyping using long reads and statistical methods

Affiliations

Whole-genome haplotyping using long reads and statistical methods

Volodymyr Kuleshov et al. Nat Biotechnol. 2014 Mar.

Abstract

The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Statistically aided long read haplotyping (a) Overview of the library preparation protocol. The subject's DNA (1) is sheared into fragments of about 10 kbp (2), which are then diluted and placed into 384 wells, at about 3,000 fragments per well (3). Within each well, fragments are amplified through long-range PCR, cut into short fragments and barcoded (4), before being finally pooled together and sequenced (5). (b) Overview of the bioinformatics pipeline. Sequenced short reads are aligned and mapped back to their original well using the barcode adapters (1). Within each well, reads are grouped into fragments (2), which are assembled at their overlapping heterozygous SNVs into haplotype blocks (3). These blocks are assigned a phase statistically based on a phased reference panel (4), which produces very long haplotype contigs (5).
Figure 2
Figure 2
Haplotyping results at several accuracy thresholds. Long statistically constructed haplotype contigs are cut at positions where confidence scores are below a certain threshold (x axes), forming shorter but more accurate haplotype blocks. We evaluate the completeness (top panels) and the switch accuracy (bottom panels) of the smaller blocks at a series of thresholds. The blocks are evaluated only over SNVs.
Figure 3
Figure 3
Haplotyping performance from 30 Gbp of sequencing. We ran the bioinformatics pipeline independently on two 30 Gbp replicate libraries of the sample NA12878. The resulting haplotype blocks are almost as accurate and only 100 kbp shorter than ones derived from two phasing libraries. Moreover, results from the two replicates are highly concordant.
Figure 4
Figure 4
Genome browser view of differentially methylated regions at the promoter of the H19 gene. Differences in DNA methylation levels (green tracks, D) and the absolute DNA methylation level at the two parental alleles (blue tracks for paternal methylation (P) and red tracks for maternal methylation (M)) are shown around the H19 locus. The shaded regions show significant (P<0.05; Fisher's exact test) difference in DNA methylation levels between the two parental alleles and are identified as a DMR.

References

    1. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nat Rev Genet. 2011;12:215–223. - PMC - PubMed
    1. Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011;12:703–714. - PMC - PubMed
    1. Roach JC, et al. Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 2011;89:382–397. - PMC - PubMed
    1. Fan HC, Wang J, Potanina A, Quake SR. Whole-genome molecular haplotyping of single cells. Nat Biotechnol. 2010;29:51–57. - PMC - PubMed
    1. Yang H, Chen X, Wong WH. Completely phased genome sequencing through chromosome sorting. Proceedings of the National Academy of Sciences. 2011;108:12–17. - PMC - PubMed

Publication types

Associated data