Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 21;6(1):442.
doi: 10.1038/s42003-023-04749-7.

Machine learning reveals limited contribution of trans-only encoded variants to the HLA-DQ immunopeptidome

Affiliations

Machine learning reveals limited contribution of trans-only encoded variants to the HLA-DQ immunopeptidome

Jonas Birkelund Nilsson et al. Commun Biol. .

Abstract

Human leukocyte antigen (HLA) class II antigen presentation is key for controlling and triggering T cell immune responses. HLA-DQ molecules, which are believed to play a major role in autoimmune diseases, are heterodimers that can be formed as both cis and trans variants depending on whether the α- and β-chains are encoded on the same (cis) or opposite (trans) chromosomes. So far, limited progress has been made for predicting HLA-DQ antigen presentation. In addition, the contribution of trans-only variants (i.e. variants not observed in the population as cis) in shaping the HLA-DQ immunopeptidome remains largely unresolved. Here, we seek to address these issues by integrating state-of-the-art immunoinformatics data mining models with large volumes of high-quality HLA-DQ specific mass spectrometry immunopeptidomics data. The analysis demonstrates highly improved predictive power and molecular coverage for models trained including these novel HLA-DQ data. More importantly, investigating the role of trans-only HLA-DQ variants reveals a limited to no contribution to the overall HLA-DQ immunopeptidome. In conclusion, this study furthers our understanding of HLA-DQ specificities and casts light on the relative role of cis versus trans-only HLA-DQ variants in the HLA class II antigen presentation space. The developed method, NetMHCIIpan-4.2, is available at https://services.healthtech.dtu.dk/services/NetMHCIIpan-4.2 .

PubMed Disclaimer

Conflict of interest statement

S.K. is an employee at Pure MHC, LLC. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the novel immunopeptidomics data.
Each row corresponds to a dataset from a given DQ-homozygous cell line. Left panel: Bar plot of overall peptide counts. The numbers on the left correspond to the cell line IDs. Middle panel: DQ HLA types of the cell lines. Right panel: Peptide length distributions.
Fig. 2
Fig. 2. AUC, AUC 0.1 and PPV predictive performance for the models trained with (w_Saghar_DQ) and without (wo_Saghar_DQ) the novel data.
Each point is the performance metric for a unique HLA class II molecule. For details on the performance metrics refer to materials and methods. The columns correspond to four different subsets of HLA molecules, namely all non-HLA-DQ molecules (NotDQ, n = 70), all DQ molecules (DQ, n = 44), DQ molecules in the novel data set (DQ_Saghar, n = 14), and DQ molecules not present in the novel data (DQ_NotSaghar, n = 30). Each boxplot shows the median inside the interquartile range (IQR) between the upper and lower quartiles, with whiskers extending to at most 1.5 times the IQR.
Fig. 3
Fig. 3. Performance of the model trained including the novel data, evaluated on both the novel data alone restricted to DQ, as well as on non-DQ including the full dataset.
Each point is the performance metric for an HLA class II molecule. Each boxplot shows the median inside the interquartile range (IQR) between the upper and lower quartiles, with whiskers extending to at most 1.5 times the IQR.
Fig. 4
Fig. 4. Contribution of cis and trans-only DQ variants in DQ-heterozygous datasets.
a Peptide-count contribution of cis and trans-only molecules in the methods with (w_Saghar_DQ) and without (wo_Saghar_DQ) the novel data. Each point shows the mean per-dataset peptide fraction for a given DQ molecule. For each method, trans-only molecules are shown in one boxplot (n = 12), while cis molecules are shown in three categories, namely all cis molecules (Cis–All, n = 29), cis molecules found in the DQ-SA training data (Cis–SA, n = 11), and cis molecules only found in the DQ-MA training data (Cis–MA, n = 18). Each boxplot shows the median inside the IQR between the upper and lower quartiles, with whiskers extending to at most 1.5 times the IQR. b DQ motif deconvolution for the Racle__TIL1 dataset. The rows correspond to the methods trained with (wSag) and without (woSag) the novel data, respectively. Peptide counts (excluding trash peptides) are displayed in parenthesis in the logo plot titles. Trans-only molecules are highlighted in red frames.
Fig. 5
Fig. 5. HLA-DQ specificity tree.
The tree is based on 61 DQ molecules including the 14 molecules described by the novel data. Orange molecules are covered by the method including the novel data with at least 100 peptides, and blue molecules are within a distance 0.025 of an orange molecule. Black molecules are non-covered (i.e. have peptide count <100 and have distance >0.025 to an orange molecule). Logos in black frames correspond to orange molecules. Logos in red frames correspond to molecules from branches with clusters of non-covered (black) molecules. The specificity tree was calculated from the pairwise similarities between the predictions scores for the DQ molecules for a set of 100,000 random natural 13-17mer peptides. Logos were constructed for the top 1% highest scoring binding cores for these 100,000 peptides.
Fig. 6
Fig. 6. Benchmark against MixMHC2pred-2.0 in terms of AUC, AUC 0.1 and PPV.
Predictions were made without peptide context encoding in both methods. Each point is the performance metric for a given sample. Each boxplot (n = 15 samples in all cases) shows the median inside the IQR between the upper and lower quartiles, with whiskers extending to at most 1.5 times the IQR. a Performance per sample calculated on the entire data. b Performance per sample calculated on the union of DQ-annotated peptides between the two methods.

Similar articles

Cited by

References

    1. Rocha N, Neefjes J. MHC class II molecules on the move for successful antigen presentation. EMBO J. 2008;27:1–5. doi: 10.1038/sj.emboj.7601945. - DOI - PMC - PubMed
    1. Reynisson B, et al. Improved prediction of MHC II antigen presentation through integration and Motif deconvolution of mass spectrometry MHC eluted ligand data. J. Proteome Res. 2020;19:2304–2315. doi: 10.1021/acs.jproteome.9b00874. - DOI - PubMed
    1. Arango MT, et al. HLA-DRB1 the notorious gene in the mosaic of autoimmunity. Immunol. Res. 2017;65:82–98. doi: 10.1007/s12026-016-8817-7. - DOI - PubMed
    1. Erlich H, et al. HLA DR-DQ haplotypes and genotypes and type 1 diabetes risk analysis of the type 1 diabetes genetics consortium families. Diabetes. 2008;57:1084–1092. doi: 10.2337/db07-1331. - DOI - PMC - PubMed
    1. Hu X, et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 2015;47:898–905. doi: 10.1038/ng.3353. - DOI - PMC - PubMed

Publication types