Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jan;60(1):1-18.
doi: 10.1007/s00251-007-0262-2. Epub 2008 Jan 10.

Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project

Affiliations
Comparative Study

Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project

Roger Horton et al. Immunogenetics. 2008 Jan.

Abstract

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Annotation and variation data in VEGA. VEGA ‘overview’ (a), ‘detailed view’ (b) and ‘basepair view’ (c) example of the variation in the OR2J1 locus in which a STOP codon is present in all haplotypes except MCF
Fig. 2
Fig. 2
Variation and annotation map of eight MHC haplotypes. The map represents the complete reference sequence (orange bar split into three 1.6 Mb sections) labelled PGF and marked with a scale (Mb) and approximate megabase positions on the NCBI36 build of chromosome 6 (grey milestones). Below the reference sequence are arrows representing gene positions and orientations colour-coded for variation status (invariable, black; with synonymous variation only, green; with non-synonymous, conservative variation, red; with non-synonymous, non-conservative variation, purple; see Table 8) and their symbols on a band denoting MHC class (extended class I, green; class I, yellow; class III, pale orange; class II, light blue; extended class II, pink; outside MHC, pale grey). Above the reference sequence, coloured bands represent the sequences of the other seven haplotypes (COX, orange; QBL, mauve; APD, yellow; DBB, green; MANN, light blue; SSTO, dark blue; MCF, purple) with sequence gaps in dark grey; the RCCX hyper-variable region shown with green (C4A block) and/or red (C4B block) or black (block absent), and the HLA–DRB hyper-variable region in shades of blue-green. Above each haplotype bar, a bar-graph represents total variation between the haplotype and the reference sequence (total variations/10 kb) in dark red. Re-examination of the sequence AL645922 from the PGF haplotype, which contains the RCCX region, has shown that the original assembly was erroneous. Correction of these errors leads us now to the conclusion that the C4A gene precedes the C4B gene in this clone sequence. This new gene order is reflected in Fig. 2
Fig. 3
Fig. 3
Clusters of haplotypes in the European haplotypic diversity. Phylogenetic relationship of 180 founder SNP haplotypes from CEPH trios spanning a 214-kb segment of the MHC class II region, including the HLA-DRB1 and HLA-DQB1 genes (54 substitutions from rs2187823 to rs2856691). a Sequenced haplotypes are widely distributed in this NJ tree and represent the vast majority of the variation in the population sampled. Four-digit alleles are indicated for the corresponding DRB1 and DQB1 genes in each haplotype ID label to highlight the HLA haplotypic distribution based on the underlying nucleotide variation. The NJ tree was constructed using pairwise genetic distances considering the Kimura 2-parameters model without correction for rate variation among sites as implemented in the MEGA2 software (Kumar et al. 2001). b Each haplotype sequenced is associated to a single haplotype cluster. This phylogenetic network (Bandelt et al. 1999) also shows that clusters (shaded area) are constituted by one central haplotype and its derivatives. Circles represent individual haplotypes, and the size of the circle is proportional to the haplotype frequency. The length of the lines connecting nodes is relative to the distance between them, e.g. distances within shaded areas (clusters) never exceed three mutation steps. Cluster of haplotypes sharing HLA alleles with sequenced cell lines are named accordingly: COX and QBL: DRB1*0301 DQB1*0201–PGF: DRB1*1501 DQB1*0602–APD: DRB1*1301 DQB1*0603–MCF: DRB1*0401 DQB1*0301–DBB: DRB1*0701 DQB1*0303–SSTO: DRB1*0403 DQB1*0302–MANN: DRB1*0701 DQB1*0202. HLA haplotypes DRB1*1103–DQB1*0301 and DRB1*0101–DQB1*0501 indicate the two major haplotype clusters not represented in the MHC haplotype project data

References

    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1034/j.1399-0039.2002.590609.x', 'is_inner': False, 'url': 'https://doi.org/10.1034/j.1399-0039.2002.590609.x'}, {'type': 'PubMed', 'value': '12445322', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/12445322/'}]}
    2. Allcock RJ, Atrazhev AM, Beck S, de Jong PJ, Elliott JF, Forbes S, Halls K, Horton R, Osoegawa K, Rogers J, Sawcer S, Todd JA, Trowsdale J, Wang Y, Williams S (2002) The MHC haplotype project: a resource for HLA-linked association studies. Tissue Antigens 59:520–521 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PubMed', 'value': '2231712', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/2231712/'}]}
    2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1073/pnas.77.6.3576', 'is_inner': False, 'url': 'https://doi.org/10.1073/pnas.77.6.3576'}, {'type': 'PMC', 'value': 'PMC349660', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC349660/'}, {'type': 'PubMed', 'value': '6932037', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/6932037/'}]}
    2. Awdeh ZL, Alper CA (1980) Inherited structural polymorphism of the fourth component of human complement. Proc Natl Acad Sci U S A 77:3576–3580 - PMC - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PubMed', 'value': '10331250', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/10331250/'}]}
    2. Bandelt HJ, Forster P, Rohl A (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:37–48 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'DOI', 'value': '10.1093/nar/27.2.573', 'is_inner': False, 'url': 'https://doi.org/10.1093/nar/27.2.573'}, {'type': 'PMC', 'value': 'PMC148217', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC148217/'}, {'type': 'PubMed', 'value': '9862982', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/9862982/'}]}
    2. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580 - PMC - PubMed

Publication types