Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Nuno R Nené^{1

2}, Alexander Ney³, Tatiana Nazarenko^{4

5}, Oleg Blyuss^{4

6}, Harvey E Johnston^{4

7}, Harry J Whitwell^{4

8

9}, Eva Sedlak⁴, Aleksandra Gentry-Maharaj¹⁰, Sophia Apostolidou¹⁰, Eithne Costello¹¹, William Greenhalf¹², Ian Jacobs^{4

13}, Usha Menon¹⁰, Justin Hsuan³, Stephen P Pereira³, Alexey Zaikin^{4

5}, John F Timms⁴

Affiliations

¹ Department of Women's Cancer, EGA Institute for Women's Health, University College London, 84-86 Chenies Mews, London, WC1E 6HU, UK. nuno.nene.10@ucl.ac.uk.
² Institute for Women's Health, University College London, Cruciform Building 1.1, Gower Street, London, WC1E 6BT, UK. nuno.nene.10@ucl.ac.uk.
³ Institute for Liver and Digestive Health, University College London, Upper 3rd Floor, Royal Free Campus, Rowland Hill Street, London, NW3 2PF, UK.
⁴ Department of Women's Cancer, EGA Institute for Women's Health, University College London, 84-86 Chenies Mews, London, WC1E 6HU, UK.
⁵ Department of Mathematics, University College London, London, WC1H 0AY, UK.
⁶ Wolfson Institute of Population Health, Queen Mary University of London, Charterhouse Square, EC1M 6BQ, London, UK.
⁷ Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK.
⁸ National Phenome Centre and Imperial Clinical Phenotyping Centre, Department of Metabolism, Digestion and Reproduction, IRDB Building, Imperial College London, Hammersmith Campus, London, W12 0NN, UK.
⁹ Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK.
¹⁰ MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, UCL, 90 High Holborn, 2nd Floor, London, WC1V 6LJ, UK.
¹¹ Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool, UK.
¹² Liverpool Experimental Cancer Medicine Centre, University of Liverpool, Liverpool, L69 3GL, UK.
¹³ University of New South Wales, Sydney, NSW, 2052, Australia.

PMID: 36670203
PMCID: PMC9860022
DOI: 10.1038/s43856-023-00237-5

Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Nuno R Nené et al. Commun Med (Lond). 2023.

. 2023 Jan 20;3(1):10.

doi: 10.1038/s43856-023-00237-5.

Authors

Affiliations

¹ Department of Women's Cancer, EGA Institute for Women's Health, University College London, 84-86 Chenies Mews, London, WC1E 6HU, UK. nuno.nene.10@ucl.ac.uk.
² Institute for Women's Health, University College London, Cruciform Building 1.1, Gower Street, London, WC1E 6BT, UK. nuno.nene.10@ucl.ac.uk.
³ Institute for Liver and Digestive Health, University College London, Upper 3rd Floor, Royal Free Campus, Rowland Hill Street, London, NW3 2PF, UK.
⁴ Department of Women's Cancer, EGA Institute for Women's Health, University College London, 84-86 Chenies Mews, London, WC1E 6HU, UK.
⁵ Department of Mathematics, University College London, London, WC1H 0AY, UK.
⁶ Wolfson Institute of Population Health, Queen Mary University of London, Charterhouse Square, EC1M 6BQ, London, UK.
⁷ Babraham Institute, Babraham Research Campus, Cambridge, CB22 3AT, UK.
⁸ National Phenome Centre and Imperial Clinical Phenotyping Centre, Department of Metabolism, Digestion and Reproduction, IRDB Building, Imperial College London, Hammersmith Campus, London, W12 0NN, UK.
⁹ Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK.
¹⁰ MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, UCL, 90 High Holborn, 2nd Floor, London, WC1V 6LJ, UK.
¹¹ Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool, UK.
¹² Liverpool Experimental Cancer Medicine Centre, University of Liverpool, Liverpool, L69 3GL, UK.
¹³ University of New South Wales, Sydney, NSW, 2052, Australia.

PMID: 36670203
PMCID: PMC9860022
DOI: 10.1038/s43856-023-00237-5

Abstract

Background: Earlier detection of pancreatic ductal adenocarcinoma (PDAC) is key to improving patient outcomes, as it is mostly detected at advanced stages which are associated with poor survival. Developing non-invasive blood tests for early detection would be an important breakthrough.

Methods: The primary objective of the work presented here is to use a dataset that is prospectively collected, to quantify a set of cancer-associated proteins and construct multi-marker models with the capacity to predict PDAC years before diagnosis. The data used is part of a nested case-control study within the UK Collaborative Trial of Ovarian Cancer Screening and is comprised of 218 samples, collected from a total of 143 post-menopausal women who were diagnosed with pancreatic cancer within 70 months after sample collection, and 249 matched non-cancer controls. We develop a stacked ensemble modelling technique to achieve robustness in predictions and, therefore, improve performance in newly collected datasets.

Results: Here we show that with ensemble learning we can predict PDAC status with an AUC of 0.91 (95% CI 0.75-1.0), sensitivity of 92% (95% CI 0.54-1.0) at 90% specificity, up to 1 year prior to diagnosis, and at an AUC of 0.85 (95% CI 0.74-0.93) up to 2 years prior to diagnosis (sensitivity of 61%, 95% CI 0.17-0.83, at 90% specificity).

Conclusions: The ensemble modelling strategy explored here outperforms considerably biomarker combinations cited in the literature. Further developments in the selection of classifiers balancing performance and heterogeneity should further enhance the predictive capacity of the method.

Plain language summary

Pancreatic cancers are most frequently detected at an advanced stage. This limits treatment options and contributes to the dismal survival rates currently recorded. The development of new tests that could improve detection of early-stage disease is fundamental to improve outcomes. Here, we use advanced data analysis techniques to devise an early detection test for pancreatic cancer. We use data on markers in the blood from people enrolled on a screening trial. Our test correctly identifies as positive for pancreatic cancer 91% of the time up to 1 year prior to diagnosis, and 78% of the time up to 2 years prior to diagnosis. These results surpass previously reported tests and should encourage further evaluation of the test in different populations, to see whether it should be adopted in the clinic.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: U.M. reports stock ownership in Abcodia UK between 2011 and 2021; U.M. has received grants from the Medical Research Council (MRC), Cancer Research UK, the National Institute for Health Research (NIHR), the India Alliance, NIHR Biomedical Research Centre at University College London Hospital, and The Eve Appeal; U.M. currently has research collaborations with iLOF, RNA Guardian and Micronoma, with funding paid to UCL; U.M. holds patent number EP10178345.4 for Breast Cancer Diagnostics; A.G. currently has research collaborations with Micronoma and iLoF, with the research funding awarded to UCL. No other potential conflicts of interest were disclosed by any of the authors.

Figures

**Fig. 1. Ensemble model performance per joined/combined time-group.**
a Distribution of receiver operating curve (ROC) area under the curve (AUC) across training folds for each of the base-learners and the Bayesian Model Averaging (BMA) stack meta-learner (Joined Time Group 2 Layer (JTG2L) model, see ‘Methods’ section on statistical analysis). See also Supplementary Figs. 24, 25 to 28 for alternative stacking methods. b ROC curves in the test set for the BMA stack per joined time-group. AUC 95% Confidence Intervals (CI) were determined by stratified bootstrapping. c Cross-time group performance of the BMA stack developed in the training set and evaluated in specific time-groups in the test set. 95% CI for AUCs are not shown but the predictions were all significant. d Sensitivity (Sens), e Positive predictive value (PPV) and f Negative predictive value (NPV) at 90% Specificity (Spec) corresponding to b. g–i Cross time-group performances for the ensemble trained in 0-4+ samples (last column in c). See also Supplementary Fig. 29 for other stacking methods. For the Matthew correlation coefficients corresponding to d–i, see Supplementary Fig. 30. In a, b, d–i, shades of blue from dark to light correspond to results obtained in 0-1, 0-2, 0-3, 0-4 and 0-4+ years to diagnosis samples, respectively. The number of independent training samples was n = 107 (0-1), n = 180 (0-2), n = 252 (0-3), n = 309 (0-4) and n = 363 (0-4+). The number of independent test set samples was n = 26 (0-1), n = 60 (0-2), n = 82 (0-3), n = 98 (0-4) and n = 114 (0-4+). See Supplementary Table 12 for further details on case and control samples. See ‘Statistical analysis’ in Methods for further details and Supplementary Data 1–3.

**Fig. 2. Feature importance across pancreatic ductal adenocarcinoma base-learner signatures.**
a Odds-ratios (represented proportionally by the size of the circles) and P-values for the ranking procedure according to a logistic regression model using Firth’s bias reduction method in the training set. b Feature importance across all base learners and joined time-groups. All the features (biomarkers and clinical covariates) presented in this figure were selected when training/optimizing the ensemble approach with 0-4+ samples. The importance plotted for the remaining joined time-groups is the importance of each feature in their respective models. See also Supplementary Fig. 33 for the full plots and additionally Supplementary Fig. 34 for models developed with single time-groups. In a and b shades of blue from dark to light correspond to results obtained in 0-1, 0-2, 0-3, 0-4 and 0-4+ years to diagnosis samples, respectively. See ‘Statistical analysis’ in Methods for further details and Supplementary Data 4, 5. OCP oral contraceptive pill use. HRT hormone replacement therapy.

**Fig. 3. Enrichment analysis.**
g:Profiler terms for the set of features selected by the optimal classifier trained in 0-4+ samples. a Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways. c Reactome Pathway Database (REAC). e WikiPathways (WP). g Gene ontology terms biological process (GO: BP). The respective adjusted p-values associated with each enrichment term or pathway are plotted in (b), (d), (f) and (h). See also Fig. 2. See ‘Statistical analysis’ in Methods for further details and Supplementary Data 6.

**Fig. 4. Performance in an external validation set.**
a Receiver operating curve (ROC) area under the curve (AUC) in the Accelerated Diagnosis of neuro Endocrine and Pancreatic TumourS (ADEPTS) external validation set for the Joined Time Group 2 Layer (JTG2L) Bayesian Model Averaging (BMA) stack models developed and selected in the UKCTOCS training set in the respective joined time-group samples (see Fig. 1), coloured in shades of green from dark to light for 0-1, 0-2, 0-3, 0-4, 0-4+ YTD samples. b Sensitivity (Sens), c Positive predictive value (PPV) and d Negative predictive value (NPV) at 90% specificity (Spec) (see also Supplementary Fig. 39 for the corresponding Matthew’s correlation coefficient value). The performances correspond to 1000 datasets whose difference from the original ADEPTS subset selected for this study is the random allocation of the missing features hormone replacement therapy (HRT) and oral contraceptive pill use (OCP) to female participants. The red dots and respective numbers correspond to estimates of the mean performance in ADEPTS (by bootstrapping with the *boot* R package (version 1.3–25)) for the respective model developed in UKCTOCS time-grouped samples. The number of independent ADEPTS samples was n = 34. See ‘Study design’ and ‘Statistical analysis’ sections in Methods for further details, and Supplementary Data 7.

See this image and copyright information in PMC

References

1. Bengtsson A, Andersson R, Ansari D. The actual 5-year survivors of pancreatic ductal adenocarcinoma based on real-world data. Sci. Rep. 2020;10:16425. doi: 10.1038/s41598-020-73525-y. - DOI - PMC - PubMed
1. Gemenetzis G, et al. Survival in locally advanced pancreatic cancer after neoadjuvant therapy and surgical resection. Ann. Surg. 2019;270:340–347. doi: 10.1097/SLA.0000000000002753. - DOI - PMC - PubMed
1. Pereira SP, et al. Early detection of pancreatic cancer. Lancet Gastroenterol. Hepatol. 2020;5:698–710. doi: 10.1016/S2468-1253(19)30416-9. - DOI - PMC - PubMed
1. Hidalgo M. Pancreatic cancer. N. Engl. J. Med. 2010;362:1605–1617. doi: 10.1056/NEJMra0901557. - DOI - PubMed
1. Ghaneh P, et al. The impact of positive resection margins on survival and recurrence following resection and adjuvant chemotherapy for pancreatic ductal adenocarcinoma. Ann. Surg. 2019;269:520–529. doi: 10.1097/SLA.0000000000002557. - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Affiliations

Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources