Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 30;21(Suppl 18):498.
doi: 10.1186/s12859-020-03813-x.

Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy

Affiliations

Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy

Elham Sherafat et al. BMC Bioinformatics. .

Abstract

Background: Personalized cancer vaccines are emerging as one of the most promising approaches to immunotherapy of advanced cancers. However, only a small proportion of the neoepitopes generated by somatic DNA mutations in cancer cells lead to tumor rejection. Since it is impractical to experimentally assess all candidate neoepitopes prior to vaccination, developing accurate methods for predicting tumor-rejection mediating neoepitopes (TRMNs) is critical for enabling routine clinical use of cancer vaccines.

Results: In this paper we introduce Positive-unlabeled Learning using AuTOml (PLATO), a general semi-supervised approach to improving accuracy of model-based classifiers. PLATO generates a set of high confidence positive calls by applying a stringent filter to model-based predictions, then rescores remaining candidates by using positive-unlabeled learning. To achieve robust performance on clinical samples with large patient-to-patient variation, PLATO further integrates AutoML hyper-parameter tuning, classification threshold selection based on spies, and support for bootstrapping.

Conclusions: Experimental results on real datasets demonstrate that PLATO has improved performance compared to model-based approaches for two key steps in TRMN prediction, namely somatic variant calling from exome sequencing data and peptide identification from MS/MS data.

Keywords: Exome sequencing; Machine learning; Peptide identification; Positive-unlabeled learning; Somatic variant calling; Tandem mass-spectrometry.

PubMed Disclaimer

Conflict of interest statement

ES and JF declare that they have no competing interests. IIM declares that he has a significant financial interest in Truvax Inc., a company developing personalized cancer vaccines.

Figures

Fig. 1
Fig. 1
Schematic representation of supervised classification (a) versus PLATO’s PU learning approach (b). Supervised classification requires training data and can perform poorly when the distributions of training and test data do not match. PU learning uses an existing model-based classifier with stringent thresholds and informed undersampling to train a classifier from the data itself
Fig. 2
Fig. 2
PLATO flowchart
Fig. 3
Fig. 3
F1 scores obtained by running PLATO with N=20 bootstraps. a Random forest classification with spies-based classification threshold versus 0.5 default, b AutoML classification with spies versus 0.5 default, and c AutoML with spies versus random forest with spies. P1–P4 denote the sequencing datasets generated for four different ovarian cancer patients
Fig. 4
Fig. 4
Expected TP count at different multiplexing rates for SNVQ, Strelka, 2CP, and PLATO run using AutoML, spies-based classification threshold selection, and 50% bootstrap support. The dots represent TP counts from the actual AccessArray resequencing experiment reported in Table 1. P1–P4 denote the sequencing datasets generated for four different ovarian cancer patients
Fig. 5
Fig. 5
Average feature importance for SNV calling (a), and boxplots of the classification cutoffs selected using the spy approach (b) over the 20 bootstraps runs performed for the P1–P4 ovarian cancer datasets
Fig. 6
Fig. 6
Percentage increase in the number of identified peptides over MaxQuant results reported in [27] using 1% FDR on the 20 MS/MS datasets from Table 2
Fig. 7
Fig. 7
Boxplots of feature importance values (displayed on a logarithimic scale) for PLATO peptide identification experiments on the 20 MS/MS datasets from Table 2

References

    1. Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348(6230):69–74. doi: 10.1126/science.aaa4971. - DOI - PubMed
    1. Srivastava PK. Neoepitopes of cancers: looking back, looking ahead. Cancer Immunol Res. 2015;3(9):969–977. doi: 10.1158/2326-6066.CIR-15-0134. - DOI - PMC - PubMed
    1. Castle JC, Kreiter S, Diekmann J, Löwer M, van de Roemer N, de Graaf J, Selmi A, Diken M, Boegel S, Paret C, Koslowski M, Kuhn AN, Britten CM, Huber C, Türeci Ö, Sahin U. Exploiting the mutanome for tumor vaccination. Cancer Res. 2012;72(5):1081–1091. doi: 10.1158/0008-5472.CAN-11-3722. - DOI - PubMed
    1. Duan F, Duitama J, Seesi SA, Ayres C, Corcelli S, Pawashe A, Blanchard T, McMahon D, Sidney J, Sette A, Baker B, Mandoiu II, Srivastava PK. Genomic and bioinformatic profiling of mutational neo-epitopes reveals new rules to predict anti-cancer immunogenicity. J Exp Med. 2014;211(11):2231–2248. doi: 10.1084/jem.20141308. - DOI - PMC - PubMed
    1. Gubin MM, Zhang X, Schuster H, Caron E, Ward JP, Noguchi T, Ivanova Y, Hundal J, Arthur CD, Krebber W-J, Mulder GE, Toebes M, Vesely MD, Lam SSK, Korman AJ, Allison JP, Freeman GJ, Sharpe AH, Pearce EL, Schumacher TN, Aebersold R, Rammensee H-G, Melief CJM, Mardis ER, Gillanders WE, Artyomov MN, Schreiber RD. Checkpoint blockade cancer immunotherapy targets tumour-specific mutant antigens. Nature. 2014;515(7528):577–81. doi: 10.1038/nature13988. - DOI - PMC - PubMed