Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 22;18(3):367-376.
doi: 10.1038/tpj.2017.7. Epub 2017 Apr 25.

Significant variation between SNP-based HLA imputations in diverse populations: the last mile is the hardest

Affiliations

Significant variation between SNP-based HLA imputations in diverse populations: the last mile is the hardest

D J Pappas et al. Pharmacogenomics J. .

Abstract

Four single nucleotide polymorphism (SNP)-based human leukocyte antigen (HLA) imputation methods (e-HLA, HIBAG, HLA*IMP:02 and MAGPrediction) were trained using 1000 Genomes SNP and HLA genotypes and assessed for their ability to accurately impute molecular HLA-A, -B, -C and -DRB1 genotypes in the Human Genome Diversity Project cell panel. Imputation concordance was high (>89%) across all methods for both HLA-A and HLA-C, but HLA-B and HLA-DRB1 proved generally difficult to impute. Overall, <27.8% of subjects were correctly imputed for all HLA loci by any method. Concordance across all loci was not enhanced via the application of confidence thresholds; reliance on confidence scores across methods only led to noticeable improvement (+3.2%) for HLA-DRB1. As the HLA complex is highly relevant to the study of human health and disease, a standardized assessment of SNP-based HLA imputation methods is crucial for advancing genomic research. Considerable room remains for the improvement of HLA-B and especially HLA-DRB1 imputation methods, and no imputation method is as accurate as molecular genotyping. The application of large, ancestrally diverse HLA and SNP reference data sets and multiple imputation methods has the potential to make SNP-based HLA imputation methods a tractable option for determining HLA genotypes.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

SL is a founder and partner in Peptide Groove LLP. All other authors of this manuscript declare no competing financial interests.

Figures

Figure 1
Figure 1. Statistical Significance of Imputation Accuracy across Loci and Methods
Statistical significance of imputation accuracy (IA) across loci and methods was assessed using a logistic regression model, as detailed in the Supplementary Information. Odds ratios and their confidence intervals are plotted relative to IA for a locus or method predictor, for all HGDP subjects, as informed by the model. HLA-A was selected as the predictor for locus comparisons, and e-HLA for method comparisons.
Figure 2
Figure 2. Locus-level Imputation Performance
Imputation accuracy (IA) was assessed at different call rates by iterative application of a confidence value threshold and recalculation of the IA. Confidence value thresholds were derived from the unique list of confidence values reported for each allele or genotype. Line length was a function of the lowest reported confidence value (only non-zero call-rates are graphed). Each panel corresponds to a different locus (HLA-A, -B, -C, and -DRB1). Color corresponds to method: blue, e-HLA; orange, HIBAG; green, HLA*IMP:02; magenta, MAGprediction. For comparison, the 90% IA (gray dotted line) and the 0.5 confidence thresholds (diamonds) for each imputation are indicated. Only non-zero accuracies are graphed. For e-HLA and HLA*IMP:02, the distribution of confidence values was small compared to HIBAG and MAGPrediction and results in line termination at higher call-rates. Different IA scales are presented for HLA-A and -C than for -B and -DRB1.
Figure 3
Figure 3. Subject-level Imputation Performance
For each method evaluated, two different imputation accuracy (IA) measures are plotted for each call rate (x-axis) at the subject-level; only subjects for which the imputations at all four loci are correct are scored as accurate. “Subset Accuracy”, the percentage of correctly imputed subjects at each call rate threshold (dashed lines), as presented for individual loci in Figure 2, is plotted alongside “Global Accuracy”, the percentage of subjects out of the total dataset that are correctly imputed for each call rate threshold (solid lines). Color corresponds to method: blue, e-HLA; orange, HIBAG; green, HLA*IMP:02; magenta, MAGprediction.
Figure 4
Figure 4. Imputation Accuracy when Masking Untrained Alleles
Imputation accuracy (IA) was assessed for each locus and method before and after removing carriers of untrained HLA alleles (i.e., not present in the reference dataset). HGDP subjects carrying one or two untrained HLA alleles were removed (masked) and IA recalculated on the remaining subjects, for which all alleles were present in the reference dataset. The diagonal represents identical IA between masked and unmasked evaluation datasets. Changes in IA resulted in a shift from the diagonal. Shape corresponds to locus: circle, HLA-A; square, -B; diamond, -C; triangle, -DRB1. Color corresponds to method: blue, e-HLA; orange, HIBAG; green, HLA*IMP:02; magenta, MAGprediction.
Figure 5
Figure 5. Maximum, Adjudicated and Standardized Imputation Accuracies in Method Combinations
At each locus, and for each combination of two, three and four methods, the difference between the maximum imputation accuracy (IA), the adjudicated IA, and the standardized IA is shown in comparison to the overall IA for each method (Method Baselines). Maximum IA was calculated over all HGDP subjects by scoring the imputation for a given subject as correct if any of the predictions in a given combination of methods was accurate. Adjudicated IA was calculated over all HGDP subjects by choosing the prediction with the highest confidence score from among the predictions in a given combination of methods for each subject, and then comparing that prediction to the evaluation dataset for accuracy. Standardized IA was calculated over all HGDP subjects by normalizing the confidence score distributions for each method and then choosing the highest confidence score as for Adjudicated IA. Ninety percent IA is indicated with the dotted line. The y-axis for HLA-A and -C uses a different scale than the y-axis for -B and -DRB1. Solid shapes correspond types of IA scores: circle, maximum IA score (Max); triangle, adjudicated IA score (Adj); asterisk, standardized IA (Std). Color corresponds to method: blue, e-HLA; orange, HIBAG; green, HLA*IMP:02; magenta, MAGprediction. Each panel corresponds to a different locus -- HLA-A, -B, -C, and -DRB1. For DRB1, the overall IA values for e-HLA and HLA*IMP:02 overlap.
Figure 6
Figure 6. SNP Proximity and Density for the HLA-A, -B, -C and DRB1 Loci
Primary Panel. The density of SNPs (ranging from 0–12) within 500,000 bases of the HLA-A, -B, -C and -DRB1 loci is shown. A distance of 0 indicates the location of each respective gene. Negative distances are telomeric of the gene in question; positive distances are centromeric. Bold line: Proximal subsets of the 164,876 SNPs present in the 1000G dataset, prior to merging with the HGDP dataset. Light shaded area: SNPs present after the merger of the 1000G and HGDP datasets (merged SNPs), prior to quality control (QC) evaluation. Dark shaded area: merged SNPs remaining after QC evaluation. Inset Panel: The cumulative number of SNPs, out of the 10,268 SNPs included in this study, within 200,000 bases of the HLA-A, -B, -C, and –DRB1 loci is shown. Color corresponds to locus: green, HLA-A; orange, HLA-B, purple; HLA-C; magenta, HLA-DRB1.

References

    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–1006. - PMC - PubMed
    1. Martin AM, Nolan D, Gaudieri S, Almeida CA, Nolan R, James I, et al. Predisposition to abacavir hypersensitivity conferred by HLA-B*5701 and a haplotypic Hsp70-Hom variant. Proc Natl Acad Sci U S A. 2004;101(12):4180–4185. - PMC - PubMed
    1. Mallal S, Nolan D, Witt C, Masel G, Martin AM, Moore C, et al. Association between presence of HLA-B*5701, HLA-DR7, and HLA-DQ3 and hypersensitivity to HIV-1 reverse-transcriptase inhibitor abacavir. Lancet. 2002;359(9308):727–732. - PubMed
    1. Hung SI, Chung WH, Liou LB, Chu CC, Lin M, Huang HP, et al. HLA-B*5801 allele as a genetic marker for severe cutaneous adverse reactions caused by allopurinol. Proc Natl Acad Sci U S A. 2005;102(11):4134–4139. - PMC - PubMed
    1. McCormack M, Alfirevic A, Bourgeois S, Farrell JJ, Kasperaviciute D, Carrington M, et al. HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans. N Engl J Med. 2011;364(12):1134–1143. - PMC - PubMed

Publication types