Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 15;32(12):i234-i242.
doi: 10.1093/bioinformatics/btw276.

Read-based phasing of related individuals

Affiliations

Read-based phasing of related individuals

Shilpa Garg et al. Bioinformatics. .

Abstract

Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.

Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.

Availability and implementation: https://bitbucket.org/whatshap/whatshap

Contact: t.marschall@mpi-inf.mpg.de.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Seven SNP loci covered by reads (horizontal bars) in three individuals. Unphased genotypes are indicated by labels 0/0, 0/1 and 1/1. The alleles that a read supports are printed in white
Fig. 2.
Fig. 2.
Simulated dataset (top) and real dataset (bottom): phasing error rate (x-axis) versus completeness in terms of the fraction of unphased SNPs (y-axis) for PedMEC-G-5 (solid line), wMEC-5 (dashed line) and wMEC-15 (dotted line). Average coverage (per individual) of input data is encoded by circles of different sizes
Fig. 3.
Fig. 3.
Three-way comparison of phasings provided by SHAPEIT, 10XGenomics and PedMEC-G-5 (on 15× coverage data). Of all pairs of consecutive SNPs phased by all three methods, the percentages of cases where the phasing reported by one method disagrees with the other two are reported. Missing to 100%: cases where all three methods agree. Left: SHAPEIT run with default parameters, corresponding to our ‘ground truth phasing’; right: SHAPEIT run with pedigree information
Fig. 4.
Fig. 4.
Two disjoint unconnected haplotype blocks for which phase information can be inferred from the genotypes

Similar articles

Cited by

References

    1. Aguiar D., Istrail S. (2013) Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics, 29, i352–i360. - PMC - PubMed
    1. Browning S.R., Browning B.L. (2011) Haplotype phasing: existing methods and new developments. Nat. Rev. Genet., 12, 703–714. - PMC - PubMed
    1. Chen W. et al. (2013a) Genotype calling and haplotyping in parent-offspring trios. Genome Res., 23, 142–151. - PMC - PubMed
    1. Chen Z.Z. et al. (2013b) Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 29, 1938–1945. - PubMed
    1. Cilibrasi R. et al. (2007) The complexity of the single individual SNP haplotyping problem. Algorithmica, 49, 13–36.