. 2020 Mar;8(3):396-408.

doi: 10.1158/2326-6066.CIR-19-0464. Epub 2019 Dec 23.

High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets

Xiaoshan M Shao^#^{1

2}, Rohit Bhattacharya^#^{1

3}, Justin Huang^#^{1

3}, I K Ashok Sivakumar^#^{1

3

4}, Collin Tokheim^{1

2}, Lily Zheng^{1

5}, Dylan Hirsch^{1

2}, Benjamin Kaminow^{1

6}, Ashton Omdahl^{1

2}, Maria Bonsack^{7

8

9}, Angelika B Riemer^{7

8}, Victor E Velculescu^{1

5

10}, Valsamo Anagnostou¹⁰, Kymberleigh A Pagel^{1

2}, Rachel Karchin^{11

2

10}

Affiliations

¹ Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.
² Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland.
⁴ Applied Physics Laboratory, Johns Hopkins University, Laurel, Maryland.
⁵ McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
⁶ Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland.
⁷ Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁸ Molecular Vaccine Design, German Center for Infection Research (DZIF), partner site Heidelberg, Heidelberg, Germany.
⁹ Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.
¹⁰ The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
¹¹ Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland. karchin@jhu.edu.

^# Contributed equally.

PMID: 31871119
PMCID: PMC7056596
DOI: 10.1158/2326-6066.CIR-19-0464

High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets

Xiaoshan M Shao et al. Cancer Immunol Res. 2020 Mar.

. 2020 Mar;8(3):396-408.

doi: 10.1158/2326-6066.CIR-19-0464. Epub 2019 Dec 23.

Authors

Affiliations

¹ Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.
² Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland.
⁴ Applied Physics Laboratory, Johns Hopkins University, Laurel, Maryland.
⁵ McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
⁶ Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland.
⁷ Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁸ Molecular Vaccine Design, German Center for Infection Research (DZIF), partner site Heidelberg, Heidelberg, Germany.
⁹ Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.
¹⁰ The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
¹¹ Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland. karchin@jhu.edu.

^# Contributed equally.

PMID: 31871119
PMCID: PMC7056596
DOI: 10.1158/2326-6066.CIR-19-0464

Abstract

Computational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins can be used to predict patient response to cancer immunotherapy. Current neoantigen predictors focus on in silico estimation of MHC binding affinity and are limited by low predictive value for actual peptide presentation, inadequate support for rare MHC alleles, and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method that predicts peptide-MHC binding. MHCnuggets can predict binding for common or rare alleles of MHC class I or II with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is faster than other methods. When compared with methods that integrate binding affinity and MHC-bound peptide (HLAp) data from mass spectrometry, MHCnuggets yields a 4-fold increase in positive predictive value on independent HLAp data. We applied MHCnuggets to 26 cancer types in The Cancer Genome Atlas, processing 26.3 million allele-peptide comparisons in under 2.3 hours, yielding 101,326 unique predicted immunogenic missense mutations (IMM). Predicted IMM hotspots occurred in 38 genes, including 24 driver genes. Predicted IMM load was significantly associated with increased immune cell infiltration (P < 2 × 10^-16), including CD8⁺ T cells. Only 0.16% of predicted IMMs were observed in more than 2 patients, with 61.7% of these derived from driver mutations. Thus, we describe a method for neoantigen prediction and its performance characteristics and demonstrate its utility in data sets representing multiple human cancers.

PubMed Disclaimer

Conflict of interest statement

Disclosure of potential conflicts of interest

The terms of these arrangements are managed by Johns Hopkins University in accordance with its conflict of interest policies.

Potential Conflicts of Interest: The terms of these arrangements are managed by Johns Hopkins University in accordance with its conflict of interest policies.

Figures

**Figure 1.. A) MHCnuggets’ architecture.**
A network is trained for each MHC allele. Each network has a LSTM layer with 64 hidden units, a Fully Connected (FC) layer with 64 hidden units and a final output layer of a single sigmoid unit. **B) Input scheme for peptides with variable lengths.** MHCnuggets architecture is capable of handling peptides of any length, but in practice a maximum length should be selected. Peptides are extended with padding until they reach the maximum length, prior to input into the neural network. The example shows padding for class II peptides with maximum length set to 30 amino acids. **C) Transfer learning protocol for parameter sharing among alleles.** A base allele-specific network is trained for each MHC class, with an allele selected by largest number of training examples. Transfer learning is applied to train networks for the remaining alleles with initial network weights set to final base network weights. A fine-tuning step identifies alleles that can be leveraged for a second round of transfer learning to produce a final network (Methods).

**Figure 2.. MHCnuggets’ features.**
A) Venn diagram representation of the MHC-peptide binding prediction functions of MHCnuggets and similar tools. B) Training and MHC allele model selection scheme for MHCnuggets.

**Figure 3.. MHC class I benchmark comparisons.**
A) PPV_n for MHC class I allele-specific prediction on binding affinity test sets from Bonsack et al. (7 alleles) and Kim *et al*. (53 alleles) (5,8) B) PPV_n for MHC class I allele-specific prediction on HLAp BST data set (Bassani-Sternberg et al. and Trolle *et al*. (7,22)), stratified by allele (6 alleles). C) PPV_n for MHC class I allele-specific prediction on HLAp BST data set (from B) stratified by peptide sequence length. D) True and false positives for each method on the top 50 ranked peptides from the HLAp BST data set. PPV_n = positive predictive value on the top n ranked peptides, where n is the number of true binders. TP=true positives. FP=false positives.

**Figure 4.. MHC class II benchmark comparisons.**
A) PPV_n for MHC class II allele-specific prediction on binding affinity test set from Jensen *et al.* (27 alleles, stratified by allele). B) auROC, K-Tau, Pearson r scores for MHC class II alleles from five-fold cross-validation. NetMHCII2.3 performance is from their self-reported auROC. auROC= area under the receiving operator characteristic curve. K-Tau = Kendall’s *tau* correlation. PPV_n = positive predictive value on the top n ranked peptides, where n is the number of true binders.

**Figure 5.. MHC class I and II benchmark comparisons to estimate rare allele performance.**
A) Schematic representation of leave one molecule out (LOMO) testing. B) PPV_n for MHC class I rare allele prediction on IEDB pseudo-rare alleles binding affinity test set (20 alleles, stratified by allele). C) PPV_n for MHC class II rare allele prediction on binding affinity test set from Jensen et al. (27 alleles, stratified by allele) (39). D) auROC for MHC class II rare allele prediction on LOMO binding affinity test set from Jensen *et al*. (27 alleles, stratified by allele) (39). NetMHCIIpan3.2 results are from their self-reported auROC. auROC = area under the receiving operator characteristic curve. PPV_n = positive predictive value on the top n ranked peptides, where n is the number of true binders.

**Figure 6.. Timing and scalability.**
Runtime benchmark of tested methods using versions available on October 1, 2019 over a range of inputs (up to 1 million peptides). A) MHC class I prediction. B) MHC class II prediction

**Figure 7.. MHC class I IMMs in TCGA patients.**
A). Number of predicted immunogenic missense mutations (IMMs) identified in 6,613 TCGA patients. Dotted line = mean IMMs per patient (15.6). Note, 123 patients had >100 predicted IMMs but are not included for visual clarity. B) Number of predicted IMMs by cancer type. C) IMMs shared by three or more patients and the cancer types in which they occurred. Each row represents a cancer type and each column illustrates the overlap of IMMs seen in a single cancer type or multiple cancer types. For example, the first column shows the number of IMMs shared among patients with colorectal adenocarcinoma (COAD) and uterine corpus endometrial carcinoma (UCEC). Bars to the left show the total number of unique IMMs in each cancer type. *Bar heights reflect count of unique shared IMMs, not total number of patients in which the IMM was observed. Cancer type abbreviations are in Methods. Image generated with UpSetR. D) Fibroblast growth factor receptor (*FGFR3*) IMM hot region identified by HotMAPs in bladder cancer (BLCA). IMMs shown and number of BLCA patients with the IMM: p.E216K (1), p.D222N (1), p.G235D (1) p.R248C (3) and p.S249C (24). Except for p.G235D, these IMMs are proximal to the interface of FGFR3 protein and the light and heavy chains of an antibody fragment designed for therapeutic application in bladder cancer (PDB ID: 3GRW) (61). ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC; kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; SARC, sarcoma.

See this image and copyright information in PMC

References

1. Anagnostou V, Smith KN, Forde PM, Niknafs N, Bhattacharya R, White J, et al. Evolution of Neoantigen Landscape during Immune Checkpoint Blockade in Non–Small Cell Lung Cancer. Cancer Discovery 2017 - PMC - PubMed
1. Yarchoan M, Johnson BA, Lutz ER, Laheru DA, Jaffee EM. Targeting neoantigens to augment antitumour immunity. Nature reviews Cancer 2017;17:209–22 - PMC - PubMed
1. Lundegaard C, Lund O, Buus S, Nielsen M. Major histocompatibility complex class I binding predictions as a tool in epitope discovery. Immunology 2010;130:309–18 - PMC - PubMed
1. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 2016;32:511–7 - PMC - PubMed
1. Kim Y, Sidney J, Buus S, Sette A, Nielsen M, Peters B. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinformatics 2014;15:241- - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

Grants and funding

U10 CA180950/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets

Affiliations

High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials