Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;27(9):1597-1607.
doi: 10.1101/gr.218891.116. Epub 2017 Aug 3.

Assembly and analysis of 100 full MHC haplotypes from the Danish population

Collaborators, Affiliations

Assembly and analysis of 100 full MHC haplotypes from the Danish population

Jacob M Jensen et al. Genome Res. 2017 Sep.

Abstract

Genes in the major histocompatibility complex (MHC, also known as HLA) play a critical role in the immune response and variation within the extended 4-Mb region shows association with major risks of many diseases. Yet, deciphering the underlying causes of these associations is difficult because the MHC is the most polymorphic region of the genome with a complex linkage disequilibrium structure. Here, we reconstruct full MHC haplotypes from de novo assembled trios without relying on a reference genome and perform evolutionary analyses. We report 100 full MHC haplotypes and call a large set of structural variants in the regions for future use in imputation with GWAS data. We also present the first complete analysis of the recombination landscape in the entire region and show how balancing selection at classical genes have linked effects on the frequency of variants throughout the region.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Assembly of 100 full MHC haplotypes. Schematic showing the construction of MHC haplotypes. Genomes in trios are de novo assembled using ALLPATHS-LG (step 1). Scaffolds larger than 50 kb mapping to the MHC are extracted and concatenated, creating diploid consensus scaffolds (step 2). Bubbles in the alignment graphs for individuals in the trio are mapped uniquely within the trio by exact matching of the sequence upstream of the bubbles (step 3). Global alignment between phased bubbles is used to create a consensus sequence between transmitted parental and inherited child haplotype sequences (steps 4 and 5). Reads from parents and child are then mapped to the consensus sequence, genotyped, and phased (step 6), gaps are closed (step 7), and reads are mapped again for another iteration of mapping, genotyping, and phasing (step 8).
Figure 2.
Figure 2.
Differences between MHC haplotypes and reference pgf. The new haplotypes and the seven alternative reference haplotypes were aligned to the reference pgf haplotype through pairwise alignment, and the percentage of pairwise differences was calculated in bins of 10 kb, shown here in white (low) to red (high). Dark gray bins contain >50% missing data (i.e., Ns); bins with red line lack alignment blocks. The region classes and important genes such as the classical loci are shown above. C4A and C4B are marked in blue.
Figure 3.
Figure 3.
Variation and population genetics. (A,B) Number of SNVs and indels across the MHC region in 50-kb sliding window (step 10 kb). (C) Nucleotide diversity (π) and (D) Tajima's D were calculated in 5-kb sliding windows (step 1 kb). (E,F) Count of nonsynonymous and synonymous SNVs across the MHC region and pN/pS estimated assuming 73% and 27% of sites to be nonsynonymous and synonymous, respectively, calculated as the proportions in the reference pgf haplotype. The MHC classes and important genes, such as classical HLA genes, are marked above.
Figure 4.
Figure 4.
Recombination across the MHC region. Recombination rate estimated across the MHC region. Arrowheads point up toward two outliers that were removed for better visualization of the rest of the region.
Figure 5.
Figure 5.
LD patterns and selection upstream of HLA-DRA. (A) Average minor allele frequencies (MAF) across the region. The red dots are the MAF of the variants, and the line shows the average MAF in bins of 10 variants. (B) Tajima's D statistic calculated in 1-kb bins. (C) Recombination rate estimate. (D) In a 60-kb region upstream of the HLA-DRA gene, the r2 statistics was calculated.
Figure 6.
Figure 6.
Linked selection. (A) Average minor allele frequencies of nonsynonymous (blue, n = 432, P-value <0.01) and synonymous variants (red, n = 369, P-value <0.001) were calculated in bins of 25 variant sites and plotted as a function of the average distance of those 25 variants to the nearest classical HLA gene (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Variants within the classical MHC genes are not included. A linear regression was fitted for each variant type on the nonbinned data. (B) Linkage disequilibrium (r2) calculated for all pairs of SNPs in either classical HLA genes (red) or control genes (gray) and all other SNPs in the MHC region are shown here as a function of distance from the genes.

References

    1. The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68–74. - PMC - PubMed
    1. Alkan C, Sajjadian S, Eichler EE. 2011. Limitations of next-generation genome sequence assembly. Nat Methods 8: 61–65. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215: 403–410. - PubMed
    1. Athanasiadis G, Cheng JY, Vilhjálmsson BJ, Jørgensen FG, Als TD, Le Hellard S, Espeseth T, Sullivan PF, Hultman CM, Kjærgaard PC, et al. 2016. Nationwide genomic study in Denmark reveals remarkable population homogeneity. Genetics 204: 711–722. - PMC - PubMed
    1. Auton A, McVean G. 2007. Recombination rate estimation in the presence of hotspots. Genome Res 17: 1219–1227. - PMC - PubMed

Publication types

LinkOut - more resources