Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;21(10):1672-85.
doi: 10.1101/gr.125047.111. Epub 2011 Aug 3.

A comprehensively molecular haplotype-resolved genome of a European individual

Affiliations

A comprehensively molecular haplotype-resolved genome of a European individual

Eun-Kyung Suk et al. Genome Res. 2011 Oct.

Abstract

Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, "Max Planck One" (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ~1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying "haploid landscapes," which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for "phase-sensitive" personal genomics. MP1's annotated haploid genomes are available as a public resource.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
NGS of single and multiple fosmid pools: whole genome coverage. (A) Sequencing a pool of 15,000 fosmids covers ∼15% of the genome. The probability that complementary haplotypes may co-occur within a pool is P < 0.0112, resulting in only a small percentage (1%–2%) of variants likely to be covered by fosmids from both haplotypes. The insert shows a specific example of 19 fosmids detected in the MHC region, concordant with the expected number of fosmids. (B) Additional sequencing of fosmid pools (coverage shown for 32 pools) results in increasing fosmid clone coverage and saturation with molecular haplotype sequence coverage across the entire genome. As shown by simulation, low coverage regions are primarily explained by limitations inherent in short read mapping (Supplemental Fig. S2).
Figure 2.
Figure 2.
Length of phased blocks. (A) Weighted histogram of genome coverage. Each gray bar shows summed length of genome covered per interval of contig size. Points show cumulative length of genome covered with increasing contig size. (B) Histogram of lengths of phased upstream/downstream regions from end of transcript to end of phased contigs, indicating the additional length of regions containing phased variants which can be analyzed in conjunction with variants within the gene in haploid context. Seventy-nine percent of genes had at least 10-kb phased upstream sequence.
Figure 3.
Figure 3.
Comparative evaluation of molecular vs. statistical phasing. (A) Stacked bar chart showing haplotype agreement between molecular (MOL) and statistical (STAT) phasing (blue) and disagreement (purple). A proportion of SNPs could only be phased molecularly (yellow) with <1% on average remaining unphased (green). Analysis includes all autosomal heterozygous SNPs in MP1. (B) Box plot showing concordance of MP1 molecular haplotypes (yellow) to the reference set of HapMap haplotypes derived by trio-based phasing (20 individuals). Statistically derived haplotypes for MP1 (green) show a lower concordance to HapMap haplotypes.
Figure 4.
Figure 4.
Examples of cis vs. trans configurations of potentially protein damaging mutations. Damaging mutations shown in pink. In cis (left), both mutations reside on the same chromosome, thus the second protein is left intact, shown for the TNRC6A gene. Mutations of TNRC6A may contribute to gastric and colorectal cancer development. In trans (right), multiple damaging mutations affect both haplotypes, shown for PABPC1 which has been associated with esophageal cancer progression and poor prognosis. Gene–gene interaction between TNRC6A and PABPC1 seems to play a role in miRNA silencing (Huntzinger et al. 2010), indicating global relevance of phase. All variants assigned to each of the two molecular haplotypes are shown, including nonsynonymous SNPs which have no predicted damaging effect on the protein (amino acids in gray boxes).
Figure 5.
Figure 5.
Example of a 1.7-Mb haploid landscape of functional variation on chr. 19. Both molecular haplotypes are shown. Differences between the two are shown at the nucleotide level with bases in yellow = G, red = T, green = A, and blue = C, within framed bars (centered). Novel SNPs are assigned to each of the two haplotypes. Regulatory motifs (TFBSs track) differ in one SNP. At the level of genome organization, genes that have two different molecular haplotypes are shaded red or green per haplotype, and those encoding two differing proteins are highlighted and framed. Nonsynonymous mutations on each haplotype are annotated (damaging AA exchanges [Adzhubei et al. 2010] in pink). Disease-related features (GWAS and OMIM) are shown in lower track.

Similar articles

Cited by

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR 2010. A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249 - PMC - PubMed
    1. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, et al. 2010. Clinical assessment incorporating a personal genome. Lancet 375: 1525–1535 - PMC - PubMed
    1. Bansal V, Bafna V 2008. HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24: i153–i159 - PubMed
    1. Bansal V, Tewhey R, Topol EJ, Schork NJ 2011. The next phase in human genetics. Nat Biotechnol 29: 38–39 - PubMed
    1. Beissbarth T, Speed TP 2004. GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20: 1464–1465 - PubMed

Publication types

LinkOut - more resources