Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 31;8(5):e64571.
doi: 10.1371/journal.pone.0064571. Print 2013.

Complete genome phasing of family quartet by combination of genetic, physical and population-based phasing analysis

Affiliations

Complete genome phasing of family quartet by combination of genetic, physical and population-based phasing analysis

Julien Lajugie et al. PLoS One. .

Abstract

Phased genome maps are important to understand genetic and epigenetic regulation and disease mechanisms, particularly parental imprinting defects. Phasing is also critical to assess the functional consequences of genetic variants, and to allow precise definition of haplotype blocks which is useful to understand gene-flow and genotype-phenotype association at the population level. Transmission phasing by analysis of a family quartet allows the phasing of 95% of all variants as the uniformly heterozygous positions cannot be phased. Here, we report a phasing method based on a combination of transmission analysis, physical phasing by pair-end sequencing of libraries of staggered sizes and population-based analysis. Sequencing of a healthy Caucasians quartet at 120x coverage and combination of physical and transmission phasing yielded the phased genotypes of about 99.8% of the SNPs, indels and structural variants present in the quartet, a phasing rate significantly higher than what can be achieved using any single phasing method. A false positive SNP error rate below 10*E-7 per genome and per base was obtained using a combination of filters. We provide a complete list of SNPs, indels and structural variants, an analysis of haplotype block sizes, and an analysis of the false positive and negative variant calling error rates. Improved genome phasing and family sequencing will increase the power of genome-wide sequencing as a clinical diagnosis tool and has myriad basic science applications.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Sequencing strategy.
Libraries of three different sizes were prepared from each member of the quartet from family FNY01 and sequenced on an Illumina HiSeq 2000. Reads were then aligned using BWA and Novoalign and variants were called and filtrated. Then, MIEs were detected. After that, variants were phased by transmission and errors were called. Phasing was then refined by physical and population-based approaches. Finally, phasing from all approaches was merged and recombination blocks and error analysis were refined. Positions called as MIE, SCE and uncalled or partially called positions were imputed by Beagle.
Figure 2
Figure 2. Phasing Strategy.
(A) Trio Phasing: the quartet was divided into 2 trios and all phasable positions (i.e. positions with at least one homozygous member) were phased. The children genotypes were arbitrarily ordered as follow: Maternal Allele | Paternal Allele. The parent genotypes were arbitrarily ordered as follow: Transmitted | Not Transmitted Allele. For each phasable position, trios were phased in three steps: First, we marked all heterozygous genotypes as phased. Then, we use the heterozygous variant to phase a parent-child pair. Finally, we phased the second parent-child pair. (B) Blocks of Inheritance: X-Y scatter plots were created for each parent by assigning a value of +1 to each phased variant where the children inherited the same parental chromosome and a value of -1 to each phased variant where the children inherited different chromosomes. These graphs were used to define the preliminary blocks of inheritance (see Figure S2 for more details).
Figure 3
Figure 3. Haplotypes.
Graphs illustrating haplotype structure in the quartet. A 550 Kb region of chromosome 5 is shown (chr 5∶97,464,122-98,064,122. Homozygous SNPs (red), heterozygous SNPs on chromosome A (blue) or heterozygous SNPs on chromosome B (green) were plotted and haplotypes were called using the island finder function of GenPlay. Regions in which islands were found only in the 1|1 track were named 1|1 blocks, regions in which islands were called only in either the 0|1 or 1|0 tracks were named 0|1 or 1|0 blocks. Regions where islands were found in either 1|1 and 0|1 or 1|1 and 1|0 or 0|1 and 1|0 or in all three tracks at the same time were named mixed-blocks. Regions in which no islands were found were named SNP-poor blocks.

Similar articles

Cited by

References

    1. International HapMap C, Frazer KA, Ballinger DG, Cox DR, Hinds DA, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. - PMC - PubMed
    1. Xiao M, Wan E, Chu C, Hsueh WC, Cao Y, et al. (2009) Direct determination of haplotypes from single DNA molecules. Nat Methods 6: 199–201. - PMC - PubMed
    1. Konfortov BA, Bankier AT, Dear PH (2007) An efficient method for multi-locus molecular haplotyping. Nucleic Acids Res 35: e6. - PMC - PubMed
    1. Li HH, Gyllensten UB, Cui XF, Saiki RK, Erlich HA, et al. (1988) Amplification and analysis of DNA sequences in single human sperm and diploid cells. Nature 335: 414–417. - PubMed
    1. Yang H, Chen X, Wong WH (2011) Completely phased genome sequencing through chromosome sorting. Proc Natl Acad Sci U S A 108: 12–17. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources