Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul:93:104686.
doi: 10.1016/j.ebiom.2023.104686. Epub 2023 Jun 26.

Plasma protein biomarkers for early prediction of lung cancer

Affiliations

Plasma protein biomarkers for early prediction of lung cancer

Michael P A Davies et al. EBioMedicine. 2023 Jul.

Abstract

Background: Individual plasma proteins have been identified as minimally invasive biomarkers for lung cancer diagnosis with potential utility in early detection. Plasma proteomes provide insight into contributing biological factors; we investigated their potential for future lung cancer prediction.

Methods: The Olink® Explore-3072 platform quantitated 2941 proteins in 496 Liverpool Lung Project plasma samples, including 131 cases taken 1-10 years prior to diagnosis, 237 controls, and 90 subjects at multiple times. 1112 proteins significantly associated with haemolysis were excluded. Feature selection with bootstrapping identified differentially expressed proteins, subsequently modelled for lung cancer prediction and validated in UK Biobank data.

Findings: For samples 1-3 years pre-diagnosis, 240 proteins were significantly different in cases; for 1-5 year samples, 117 of these and 150 further proteins were identified, mapping to significantly different pathways. Four machine learning algorithms gave median AUCs of 0.76-0.90 and 0.73-0.83 for the 1-3 year and 1-5 year proteins respectively. External validation gave AUCs of 0.75 (1-3 year) and 0.69 (1-5 year), with AUC 0.7 up to 12 years prior to diagnosis. The models were independent of age, smoking duration, cancer histology and the presence of COPD.

Interpretation: The plasma proteome provides biomarkers which may be used to identify those at greatest risk of lung cancer. The proteins and the pathways are different when lung cancer is more imminent, indicating that both biomarkers of inherent risk and biomarkers associated with presence of early lung cancer may be identified.

Funding: Janssen Pharmaceuticals Research Collaboration Award; Roy Castle Lung Cancer Foundation.

Keywords: Early-detection; Lung cancer prediction; Plasma; Proteins; Proteomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests This work was funded through a Research Collaboration Agreement between Janssen Pharmaceuticals and the University of Liverpool: MD & JKF received research funding from Janssen Pharmaceuticals (a Johnson & Johnson company). TS, HA, LH & RY are employees of Johnson & Johnson, the company has filed a patent to on use of plasma protein biomarkers in lung cancer interception. TL declares no conflict of interest. Both parties shared responsibility for: study design; collection, analysis and interpretation of experimental data; writing the report and the decision to publish.

Figures

Fig. 1
Fig. 1
Circulating plasma proteins prediction of future lung cancer. (a) A boxplot of training AUC values from four different machine learning models (Elastic Net, Random Forest, Support Vector Machine, XGBoost, 5-fold CV repeated 5 times) trained on the LLP cohort to predict lung cancer in patients 1–3 years before diagnosis (53 cancer and 109 control samples). (b) Protein levels in LLP subjects were transformed using the z-score method and combined to generate one score. Combined z-scores were plotted over time in the LLP cohort for 1-3Y proteins. (c) AUROC of 1-3Y SVM model trained in Liverpool tested in UK Biobank samples 1–3 years before lung cancer diagnosis (62 cancer and 5500 control samples). (d) Performance of the 1-3Y SVM model in the UK Biobank across different years of diagnosis of lung cancer. Samples taken at different times prior to lung cancer were segregated by year (2–12 years) and the SVM model for 1-3Y was tested by ROC analysis. (e) Barplot for AUROC values for SVM model predicting future development of cancer for several cancer types from UK Biobank 1–3 years before diagnosis. The same approach as taken for lung cancer (see methods) was taken to identify plasma samples at least 2 years prior to other first cancer diagnosis (number of cases labelled on bar chart) and the AUC for ROC analysis shown.
Fig. 2
Fig. 2
Combined z-score from 1-3Y in relation to cancer stage and pack years of smoking. Protein levels in LLP subjects were transformed using the z-score method and combined to generate one score. (a) Combined z-scores were plotted in time-frame categories (5–10 years, 3–5 years, 1–3 years prior to diagnosis or at diagnosis) for healthy subjects and cases of different lung cancer stage for 1-3Y proteins with P-values generated using Wilcoxon signed-rank test. (b) The z-scores were also correlated with pack years of smoking at time of sample in the same time frame categories; correlation was measured using Pearson correlation coefficient.
Fig. 3
Fig. 3
Circulating plasma proteins prediction of long-term future lung cancer. (a) A boxplot of training AUC values from four different machine learning models (Elastic Net, Random Forest, Support Vector Machine, XGBoost, 5-fold CV repeated 5 times) trained on the LLP cohort to predict lung cancer in patients 1–5 years before diagnosis (110 Cancer, 215 control samples). (b) Protein levels in LLP subjects were transformed using the z-score method and combined to generate one score. Combined z-scores were plotted over time in the LLP cohort for 1-5Y proteins. (c) The z-scores were also correlated with age at time of sample in the same time frame categories; correlation was measured using Pearson correlation coefficient.
Fig. 4
Fig. 4
Gene Enrichment Analysis. Top 20 pathways over- or under-represented in plasma samples from 1-3Y or 1-5Y models, demonstrating largely different pathways for different predictive panels (blue) with three shared over-represented (green) and three shared under-represented (red) pathways.

References

    1. Sung H., Ferlay J., Siegel R.L., et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin. 2021;71(3):209–249. - PubMed
    1. Miller K.D., Nogueira L., Mariotto A.B., et al. Cancer treatment and survivorship statistics, 2019. CA A Cancer J Clin. 2019;69(5):363–385. - PubMed
    1. Nicholson A.G., Chansky K., Crowley J., et al. The international association for the study of lung cancer lung cancer staging project: proposals for the revision of the clinical and pathologic staging of small cell lung cancer in the forthcoming eighth edition of the TNM classification for lung cancer. J Thorac Oncol. 2016;11(3):300–311. - PubMed
    1. de Koning H.J., van der Aalst C.M., de Jong P.A., et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med. 2020;382(6):503–513. - PubMed
    1. Field J.K., Vulkan D., Davies M.P.A., et al. Lung cancer mortality reduction by LDCT screening: UKLS randomised trial results and international meta-analysis. Lancet Reg Health Eur. 2021;10 - PMC - PubMed