Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021:20:100111.
doi: 10.1016/j.mcpro.2021.100111. Epub 2021 Jun 12.

Withdrawn: Precision Neoantigen Discovery Using Large-scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation

Affiliations

Withdrawn: Precision Neoantigen Discovery Using Large-scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation

Rachel Marty Pyke et al. Mol Cell Proteomics. 2021.

Retraction in

Abstract

This article has been withdrawn by the authors. A publication of the manuscript with the correct figures and tables has been approved and the authors state the conclusions of the manuscript remain unaffected. Specifically, errors are in Figure 6A, Supplementary Figure 10B, Supplementary Figure 10C, and Supplementary Table 5. The details of the errors are as follows: the HLA types for one sample were incorrectly assigned because of a tumor/normal mislabeling from the biobank vendor. Due to the differing HLA types between the tumor and normal sample, the sequence analysis established that the HLA alleles for this patient had been deleted (HLA LOH). The authors conclude that this was an artifact caused by the normal sample mislabeling. The corrected version can be accessed (Pyke, R.M., Mellacheruvu, D., Dea, S., Abbott, C.W., Zhang, S.V., Philips, N.A., Harris, J., Bartha, G., Desai, S., McClory, R., West, J., Snyder, M,P., Chen, R., Boyle, S.M. (2022) Precision Neoantigen Discovery Using Large-Scale Immunopeptidomics and Composite Modeling of MHC Peptide Presentation. Mol. Cell. Proteomics 22, 100506

Keywords: MHC; cancer; cancer vaccines; immunology; immunopeptidomics; machine learning; major histocompatibility complex; neoantigen prediction; neoantigens; next generation sequencing.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest R. M. P., D. M., S. D., C. W. A., S. V. Z., N. P., J. H., G. B., Sejal Desai, R. M., J. W., R. C., and S. M. B are full- time employees of Personalis and owners of Personalis stock. M. P. S. co-founded Personalis and owns Personalis stock.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Generation and overview of the monoallelic data.A, a schematic of the experimental procedure to generate the monoallelic training data. An HLA allele and B2M were stably transfected into an HLA-null K562 parental cell line. The MHC-peptide complex was purified using a w6/32 antibody and the peptides were gently eluted off the complexes. The peptides were sequenced with LC-MS/MS and identified with a database search. B, bar plots showing the peptide yields and distribution of peptide lengths for each of the 25 monoallelic cell lines. C, a comparison between motifs of peptides generated from our monoallelic cell line with HLA-B35:01 and a publicly available dataset for the same allele. Motifs are shown for peptides of length 8, 9, 10, and 11. See supplemental Fig. S1 for motifs from all 25 cells. See supplemental Fig. S2 for comparisons with other public datasets. D, a bar plot showing the distribution of the ratio of observed peptides from the monoallelic cell lines compared with random expectation across several TPM ranges. Values are shown with a log10 transformation. E, a heatmap showing the enrichment and depletion of five amino acids upstream and downstream of the peptides identified from the monoallelic cell lines compared with a random expectation. Red denotes the enrichment of amino acids and blue denotes the depletion of them. The C- and N-termini of the protein are denoted with “-.”
Fig. 2
Fig. 2
Binding pocket diversity and population frequencies of novel alleles.A, heatmaps for HLA-A and -B that represent the binding pocket similarity between alleles with monoallelic immunopeptidomics data. Dark blue squares represent alleles that have very similar binding pockets while white squares represent alleles with divergent binding pockets. The 25 alleles profiled with our monoallelic system are denoted in orange. The five alleles that have not previously been profiled are denoted in green. Motifs for these novel alleles are shown alongside motifs for related alleles in gray. Black boxes denote the cluster of alleles containing the newly profiled alleles. B, a heatmap showing the frequencies of the five novel alleles in several populations of diverse world ethnicities. Dark purple denotes high population frequencies of the alleles and light purple denotes low population frequencies.
Fig. 3
Fig. 3
Systematic expansion of HLA ligandome through the incorporation of publicly available data.A, box plots representing the number of unique peptides per sample from monoallelic and multiallelic immunopeptidomics samples that were reprocessed through our pipeline. Bar plot showing the number of samples for each project. Samples are colored according to their project. Peptide yields are log10 transformed. See supplemental Table S2 for additional details. B, a heatmap of expression values (TPM) of highly differentiated genes between tissue and tumor types of publicly available multiallelic immunopeptidomics data. Low expression is shown with red, and high expression is shown with blue. C, a volcano plot denoting the differential gene expression between the monoallelic parental cell lines, B721.221 and K562. Gene transcripts with significant upregulation in B721.221 compared with K562 are shown in green while gene transcripts with significant upregulation in K562 compared with B721.221 are shown in red. Gene transcripts with no significant up- or downregulation are shown in gray. D, a bar plot denoting the weighted fraction of alleles in 18 ethnicity populations from the National Marrow Donor Program within the expanded training dataset, including monoallelic cell lines profiled in house, public monoallelic data, public multiallelic data, and binding assay data from IEDB. E, two stacked bar plots showing the frequencies of amino acids at each position in the pseudo binding pocket for all annotated alleles in IMGT (top) and all alleles from the expanded training dataset, including monoallelic cell lines profiled in house, public monoallelic data, public multiallelic data, and binding assay data from IEDB.
Fig. 4
Fig. 4
Modeling binding and presentation.A, a schematic showing the difference between MHC binding and MHC presentation. MHC binding involves the ability of an MHC allele to bind to a paired peptide and is modeled with the peptide (P), allele-binding pocket (B), and peptide length (L). MHC presentation involves all steps in the antigen processing pathway in addition to MHC binding and is modeled with the peptide (P), allele-binding pocket (B), peptide length (L), gene expression (T), flanking regions around the peptide (F), propensity of the gene to engender peptides (G), and propensity of the region within the gene to engender peptides (H). B, boxplots representing the distribution of peptides per transcript observed in the reprocessed multiallelic immunopeptidomics data across transcript deciles. The peptides observed are normalized by transcript length. Red boxes denote the transcripts that generate many observed peptides despite low expression levels and transcripts that generate few observed peptides despite high expression levels. C, the distributions of expected and observed peptides from across the ACTB protein. Expected peptides, shown in gray, are generated by summing the number of frequent alleles predicted to bind each peptide (Rank <2 by netMHCpan4.0). The 30 most frequent alleles in the reprocessed multiallelic immunopeptidomics dataset were used for the analysis. Observed peptides are measured from the reprocessed multiallelic immunopeptidomics data and are shown in green.
Fig. 5
Fig. 5
Overview of composite modeling approach and model performance.A, a schematic of the composite modeling approach. Inhouse monoallelic immunopeptidomics data, public monoallelic immunopeptidomics data, and IEDB data are used to train MONO-binding. MONO-binding is used to deconvolute the multiallelic immunopeptidomics data to create pseudo monoallelic data. All monoallelic and pseudo monoallelic data is combined to train the SHERPA-binding model. The SHERPA-binding model is used as a feature along with other presentation features to train the SHERPA-presentation model on monoallelic immunopeptidomics data. B, a precision–recall curve demonstrating the predicted pan-performance on unseen alleles (MONO-binding-LOO) compared with MONO-binding and NetMHCpan4.1-BA, NetMHCpan-4.1-EL, MHCFlurry-2.0-BA. A model was trained for each allele with the data for that allele excluded from the training dataset. The MONO-binding-LOO curve represents the predictions from each of the models on the test data of the allele excluded from the training data. C and D, boxplots denoting the distributions of positive predictive values (top 0.1%) across alleles within the monoallelic immunopeptidomics held-out test data. Distributions are shown for (C) NetMHCpan4.1-BA, NetMHCpan-4.1-EL, MHCFlurry-2.0-BA, MONO-binding, SHERPA-binding and SHERPA-presentation, and (D) SHERPA-binding, SHERPA-binding+F, SHERPA-binding+FT, SHERPA-binding+TTG, and SHERPA-presentation. E, boxplots showing the distribution of precision and recall values across alleles in the monoallelic immunopeptidomics data for SHERPA-presentation across several percentile rank thresholds. A percentile rank of 0.1 is selected as the optimal threshold.
Fig. 6
Fig. 6
Performance of SHERPA on tissue samples and immunogenic epitopes. Boxplots showing the distribution of prediction performance across (A) tumors profiled with immunopeptidomics in-house (lung and colorectal, left), (B) by Schuster et al. (ovarian, middle) and (C) Loffler et al. (colorectal, right). Performance is defined as the fraction of peptides observed with immunopeptidomics that are predicted to bind in the top 0.1% of all peptides percentile rank ≤0.1. Performance is shown for the following models: NetMHCpan4.1-BA, NetMHCpan-4.1-EL, MHCFlurry-2.0-BA, MONO-binding, SHERPA-binding, and SHERPA-presentation. D and E, bar plots showing the sensitivity of NetMHCpan4.1-BA, NetMHCpan-4.1-EL, MHCFlurry-2.0-BA, MONO-binding, and SHERPA-binding on the Chowell et al. immunogenicity dataset: (D) performance across all epitopes and (E) performance across high frequency alleles.

References

    1. Wells D.K., van Buuren M.M., Dang K.K., Hubbard-Lucey V.M., Sheehan K.C.F., Campbell K.M., Lamb A., Ward J.P., Sidney J., Blazquez A.B., Rech A.J., Zaretsky J.M., Comin-Anduix B., Ng A.H.C., Chour W. Key parameters of tumor epitope immunogenicity revealed through a Consortium approach improve neoantigen prediction. Cell. 2020;183:818–834. - PMC - PubMed
    1. Yadav M., Jhunjhunwala S., Phung Q.T., Lupardus P., Tanguay J., Bumbaca S., Franci C., Cheung T.K., Fritsche J., Weinschenk T., Modrusan Z., Mellman I., Lill J.R., Delamarre L. Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing. Nature. 2014;515:572–576. - PubMed
    1. Schumacher T.N., Schreiber R.D. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. - PubMed
    1. Sette A., Vitiello A., Reherman B., Fowler P., Nayersina R., Kast W.M., Melief C.J., Oseroff C., Yuan L., Ruppert J., Sidney J., del Guercio M.F., Southwood S., Kubo R.T., Chesnut R.W. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. J. Immunol. 1994;153:5586–5592. - PubMed
    1. Hunt D., Henderson R., Shabanowitz J., Sakaguchi K., Michel H., Sevilir N., Cox A., Appella E., Engelhard V. Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry. Science. 1992;255:1261–1263. - PubMed

Publication types