Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response

Yang Luo et al. Nat Genet. 2021 Oct.

Erratum in

Abstract

Fine-mapping to plausible causal variation may be more effective in multi-ancestry cohorts, particularly in the MHC, which has population-specific structure. To enable such studies, we constructed a large (n = 21,546) HLA reference panel spanning five global populations based on whole-genome sequences. Despite population-specific long-range haplotypes, we demonstrated accurate imputation at G-group resolution (94.2%, 93.7%, 97.8% and 93.7% in admixed African (AA), East Asian (EAS), European (EUR) and Latino (LAT) populations). Applying HLA imputation to genome-wide association study data for HIV-1 viral load in three populations (EUR, AA and LAT), we obviated effects of previously reported associations from population-specific HIV studies and discovered a novel association at position 156 in HLA-B. We pinpointed the MHC association to three amino acid positions (97, 67 and 156) marking three consecutive pockets (C, B and D) within the HLA-B peptide-binding groove, explaining 12.9% of trait variance.

PubMed Disclaimer

Figures

Extended Data Fig. 1
Extended Data Fig. 1. HLA nomenclature
Description of a classical HLA allele using current standard nomenclature. The first field corresponds to the serological antigen. The second field distinguishes HLA alleles that differ by one or more missense variants. The third field distinguishes HLA alleles that differ by one or more synonymous variants. The G-group distinguishes HLA alleles that differ by one or more synonymous variants within the exons that encode the peptide binding groove regions (exon 2 and 3 for HLA class I genes and exon 2 for HLA class II genes).
Extended Data Fig. 2
Extended Data Fig. 2. Correlation between imputed and typed dosage (dosage r2) of classical HLA alleles in 1,067 Admixed African HIV-1 samples
The x-axis shows the minor allele frequency observed in the SBT dataset. Blue points show G-group HLA alleles. Red points show one-field HLA alleles.
Extended Data Fig. 3
Extended Data Fig. 3. Association tests within the MHC to HIV-1 viral load
The x-axis shows the genomic positions of chromosome 6 (build 37), and the y-axis is the -log10 (P-value) obtained from two-sided regression analyses for SNPs (gray), classical HLA alleles (blue) and amino acids (red). The dashed black line indicates the genome-wide significance threshold (P = 5 × 10−8). For biallelic markers, results were calculated by a linear regression model including sex, cohort-specific principal components and ancestry indicator as covariates (circle). Association at amino acid positions with more than two residues was calculated using a multi-degree-of-freedom omnibus test (one-sided F-test) including the same covariates (diamond). The top associated amino acid, classical HLA allele and SNPs are annotated in the figure. a, Of all variants tested, the top hit maps to amino acid position 97 in HLA-B. b, Subsequent conditional analysis controlling for all residues at position 97 in HLA-B revealed an independent association at position 67 in HLA-B. c, Results conditioned on position 97 and 67 in HLA-B showed a third signal at position 156 in HLA-B. d, Results conditioned on position 97, 67 and 156 in HLA-B showed position 77 in HLA-A has the strongest association signal outside HLA-B among all amino acid positions. e, Results conditioned on all amino acid positions in HLA-B. Notably, amino acid positions were more significant than any single SNP or classical HLA allele in each conditional analysis for the three amino acid positions in HLA-B.
Extended Data Fig. 4
Extended Data Fig. 4. Effect on set point viral load of individual residues at position 97 in HLA-B
Mean set point viral load (spVL, RNA copies per milliliter) and its standard error of all six residues at position 97 in HLA-B in three populations independently. Data are presented as mean values ± standard errors. Residues are ranked from the most protective to the riskiest in the overall population. There are 3,901 Admixed African, 7,455 European, and 677 Latino independent samples included in the analysis.
Extended Data Fig. 5
Extended Data Fig. 5. Global diversity of the MHC region
Principal component analysis of the pairwise IBD distance between 21,546 samples using MHC region markers. The first two principal components show separation of continental groups.
Extended Data Fig. 6
Extended Data Fig. 6. Diversity of eight classical HLA genes in the constructed multi-ancestry MHC reference panel
Each gene is stratified by six populations (AA, Admixed African; EAS, East Asian; EUR, European; LAT, Latino; SAS, South Asian). The top two most common alleles within each classical gene of each population are plotted across all panels. Alleles that have frequencies greater than 1% are also labelled in the bar plots. a, Class I genes. b, Class II genes.
Extended Data Fig. 7
Extended Data Fig. 7. Allele diversity of eight classical HLA genes in global populations
For each gene, the top five most frequent alleles across all populations are shown (light blue, most frequent; dark blue, second frequent; light green, third frequent; dark green, fourth frequent; red, fifth frequent; gray, all other alleles).
Extended Data Fig. 8
Extended Data Fig. 8. Pairwise normalized entropy (ε) among all population groups
The normalized entropy (ε) measures the difference of the haplotype frequency distribution for linkage disequilibrium and linkage equilibrium, and takes values between 0 (no LD) to 1 (perfect LD).
Extended Data Fig. 9
Extended Data Fig. 9. Deviation from average genome-wide ancestry in Admixed African and Latino populations
a,b, The x-axis is the genomic position of chromosome 6. The y-axis shows the local African ancestry deviation measure inferred at a given position for Admixed Africans (a) and Latinos (b). The MHC region (chr6:28Mb-34Mb) is highlighted in red shading. Local ancestries were estimated using RFMix (red) and ELAI (blue). The ancestry deviation measure is the difference between African ancestry at a given genomic position with respect to the genome-wide average estimated by ADMIXTURE with K = 3, normalized by the standard deviation of the ancestry estimate. The dashed line indicates the genome-wide significance threshold at ±4.42 standard deviation of the ancestry estimate deviated from the genome-wide average.
Extended Data Fig. 10
Extended Data Fig. 10. Conditional analysis of other previously reported independently associated amino acid positions
a,b, Manhattan plots of amino acid positions in the six classical HLA genes. Each point shows a single amino acid position and its omnibus P-value after controlling for independent positions that are associated with spVL in this study (position 97, 67 and 156 in HLA-B) (a) and independent positions that are only reported in previous studies, and not in the presented work (position 45, 63 and 116 in HLA-B and position 77, 95 in HLA-A) (b). Independently associated amino acid positions that are only reported in the European population are shown in blue. Independently associated amino acid positions that are only reported in the African American population are shown in purple. Independently associated amino acid positions identified in this study are shown in red.
Figure 1 |
Figure 1 |. A schematic showing the overall study design.
We used whole-genome sequences of 21,546 individuals from five global populations to construct an HLA imputation reference panel. We then performed HLA imputation and fine-mapping in HIV-1 viral load jointly in three populations.
Figure 2 |
Figure 2 |. The multi-ancestry HLA reference panel shows improvement in allele diversity and imputation accuracy.
a, The number of HLA alleles at the two-field resolution included in the multi-ancestry HLA reference panel (n = 21,546) compared to the European only Type 1 Diabetes Genetics Consortium (T1DGC) panel (n = 5,225) as well as a subset of the multi-ancestry HLA panel down-sampled to the same size as T1DGC. b, The correlation between imputed and typed dosages of classical HLA alleles using the multi-ancestry HLA reference panel at one-filed (red) and G-group resolution (black) of 955 individuals with SBT HLA typing data from the 1000 Genomes project. c, The imputation accuracy for five classical HLA genes at one-field, two-field and G-group resolution. d, The imputation accuracy at G-group resolution of the 1000 Genomes individuals stratified by four diverse ancestries when using three different imputation reference panels as described in a.
Figure 3 |
Figure 3 |. Stepwise conditional analysis of the allele and amino acid positions of classical HLA genes to HIV-1 viral load.
a-h, Each circle point represents the -log10(Pbinary) from two-sided linear regression for all classical HLA alleles. Each diamond point represents -log10(Pomnibus) from one-sided F-test for the tested amino acid positions in HLA (blue, HLA-A; yellow, HLA-C; red, HLA-B; light blue, HLA-DRB1; green, HLA-DQA1; purple, HLA-DQB1, dark green, HLA-DPA1; light green, HLA-DPB1). Association at amino acid positions with more than two alleles was calculated using a multi-degree-of-freedom omnibus test. The dashed black line represents the significance threshold of P = 5 × 10−8 to correct for multiple comparisons (Bonferroni correction). Each panel shows the association plot in the process of stepwise conditional omnibus test. One-field classical allele HLA-B*57 (P = 9.84 × 10−138) (a) and amino acid position 97 in HLA-B (Pomnibus = 1.86 × 10−184) (b) showed the strongest association signal. Results conditioned on position 97 in HLA-B showed a secondary signal at classical allele HLA-B*81:01:01:G (P = 4.53 × 10−23) (c) and position 67 in HLA-B (Pomnibus = 1.08 × 10−40) (d). Results conditioned on position 97 and 67 in HLA-B showed the same classical allele HLA-B*81:01:01G (P = 2.70 × 10−23) (e) and third signal at position 156 in HLA-B (Pomnibus = 1.92 × 10−30) (f). Results conditioned on position 97, 67 and 156 in HLA-B showed a fourth signal at HLA-A*31 (P = 2.45 × 10−8) (g) and position 77 in HLA-A (Pomnibus = 5.35 × 10−7) outside HLA-B (h).
Figure 4 |
Figure 4 |. Location and effect of three independently associated amino acid positions in HLA-B.
a, Allele frequencies of six residues at position 97 in HLA-B among three populations. b, Effect on set point viral load (spVL) (i.e., change in log10 HIV-1 spVL per allele copy) of individual amino acid residues at position 97 in HLA-B. Results were calculated per allele using linear regression models, including gender and principal components within each ancestry as covariates. There are 3,901 Admixed African (purple), 7,455 European (blue) and 677 Latio (orange) independent samples included in the analysis. Data are presented as mean values (beta) ± standard errors. c, HLA-B (PDB ID code 2bvp) proteins. Omnibus and stepwise conditional analysis identified three independent amino acid positions (positions 97 (red), 67 (orange), and 156 (green) in HLA-B. d, Effect on spVL (i.e., change in log10 HIV-1 spVL per allele copy) of individual amino acid residues at each position reported in this and previous work,. Results were calculated per allele using linear regression models. The x-axis shows the effect size and its standard errors in the joint analysis, and the y-axis shows the effect sizes ± standard error in individual populations (purple, Admixed African, n = 3,901; blue, European, n = 7,455; orange, Latino, n = 677). e, Variance of spVL explained by the haplotypes formed by different amino acid positions.
Figure 5 |
Figure 5 |. Pairwise LD and haplotype structure for eight classical HLA genes in five population groups.
Haplotype structures of the eight classical HLA genes in each population. The tile in a bar represents an HLA allele, and its height corresponds to the frequencies of the HLA allele. The gray lines connecting between two alleles represent HLA haplotypes. The width of these lines corresponds to the frequencies of the haplotypes. The most frequent long-range HLA haplotypes within each population is bolded and highlighted in a color described by the key at the bottom.

References

    1. International HIV Controllers Study et al. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330, 1551–1557 (2010). - PMC - PubMed
    1. Raychaudhuri S et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012). - PMC - PubMed
    1. Evans DM et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat. Genet. 43, 761–767 (2011). - PMC - PubMed
    1. Snyder A et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014). - PMC - PubMed
    1. Buniello A et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). - PMC - PubMed

Publication types

Grants and funding