Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;82(1):48-56.
doi: 10.1016/j.ajhg.2007.09.001.

A statistical method for predicting classical HLA alleles from SNP data

Affiliations

A statistical method for predicting classical HLA alleles from SNP data

Stephen Leslie et al. Am J Hum Genet. 2008 Jan.

Abstract

Genetic variation at classical HLA alleles is a crucial determinant of transplant success and susceptibility to a large number of infectious and autoimmune diseases. However, large-scale studies involving classical type I and type II HLA alleles might be limited by the cost of allele-typing technologies. Although recent studies have shown that some common HLA alleles can be tagged with small numbers of markers, SNP-based tagging does not offer a complete solution to predicting HLA alleles. We have developed a new statistical methodology to use SNP variation within the region to predict alleles at key class I (HLA-A, HLA-B, and HLA-C) and class II (HLA-DRB1, HLA-DQA1, and HLA-DQB1) loci. Our results indicate that a single panel of approximately 100 SNPs typed across the region is sufficient for predicting both rare and common HLA alleles with up to 95% accuracy in both African and non-African populations. Furthermore, we show that HLA alleles can be successfully predicted by using previously genotyped SNPs that are within the MHC and that had not been chosen for their ability to predict HLA alleles, such as those included on genome-wide products. These results indicate that our methodology, combined with an extended database of reference haplotypes, will facilitate large-scale experiments, including disease-association studies and vaccine trials, in which detailed information about HLA type is valuable.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SNP-Based Imputation of Classical HLA Alleles (A) Schematic representation of IBD-based imputation. In the upper section, two chromosomes carrying the same allele (blue circle) share extended similarity with a recent common ancestor (blue segments) and therefore also with one another. A second, but related, allele (purple circle—e.g., one that is identical at two-digit resolution) shares more limited and divergent, but nevertheless detectable, similarity. In the lower section, the same allele (red circle) sits on two distinct haplotype backgrounds. A conventional tagging approach will both fail to identify the more distant relatedness between alleles in the upper section and will fail to identify a single tag set in the lower section. (B) SNP haplotypes for HLA-B alleles at four-digit resolution with five or more copies in the CEU training data at the 40 SNPs chosen for allele prediction with the Affymetrix array data for the 1958 birth cohort; each row represents a unique chromosome and the alleles at the SNPs are arbitrarily coded as black and white. Note that unlike a conventional tagging approach, there is typically no unique haplotype that defines the presence of an allele. rsIDs are indicated above and the location of prediction SNPs relative to HLA-B within the ∼4 Mb HLA region (defined here as the region from SNPs rs7754054 to rs769051) is shown below.
Figure 2
Figure 2
The Relationship between the Number of Times an Allele Appears in the Database and the Sensitivity and Specificity of Predictions The relationship between the numbers of times an allele appears in the database and the sensitivity and specificity of predictions. Results are shown for (A) four-digit and (B) two-digit resolution for the Illumina data predictions only. Sensitivity is the proportion of cases in which a predicted allele is present in an individual. Specificity is the proportion of cases in which an allele present in an individual has been correctly predicted. Each allele is represented, and colors indicate the locus (HLA-A, blue; HLA-B, red; HLA-DRB1, purple; and HLA-DQB1, orange). Note that two four-digit alleles stand out as having many copies in the database and low sensitivity. It appears these alleles have only been typed to two-digit resolution in the 1958 birth cohort data, and so accuracy cannot be accurately determined.
Figure 3
Figure 3
Calibration of Call Probabilities in the 58 Birth Cohort Data at Four-Digit Resolution Accuracy estimates (±2 SE) are shown for the predictions made with the Affymetrix array (gray) and the Illumina (black) array. The slightly higher accuracy of the Illumina data is primarily due to the higher density of SNPs from which to choose accurate prediction sets, particularly within the vicinity of HLA-DQB1.

References

    1. de Bakker P.I., McVean G., Sabeti P.C., Miretti M.M., Green T., Marchini J., Ke X., Monsuur A.J., Whittaker P., Delgado M. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 2006;38:1166–1172. - PMC - PubMed
    1. Malkki M., Single R., Carrington M., Thomson G., Petersdorf E. MHC microsatellite diversity and linkage disequilibrium among common HLA-A, HLA-B, DRB1 haplotypes: Implications for unrelated donor hematopoietic transplantation and disease association studies. Tissue Antigens. 2005;66:114–124. - PubMed
    1. Cooke G.S., Hill A.V. Genetics of susceptibility to human infectious disease. Nat. Rev. Genet. 2001;2:967–977. - PubMed
    1. Gregersen P.K., Behrens T.W. Genetics of autoimmune diseases–disorders of immune homeostasis. Nat. Rev. Genet. 2006;7:917–928. - PubMed
    1. Burdick J.T., Chen W.M., Abecasis G.R., Cheung V.G. In silico method for inferring genotypes in pedigrees. Nat. Genet. 2006;38:1002–1004. - PMC - PubMed

Publication types

Substances