Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 17;14(6):716.
doi: 10.3390/biom14060716.

Accurate Early Detection and EGFR Mutation Status Prediction of Lung Cancer Using Plasma cfDNA Coverage Patterns: A Proof-of-Concept Study

Affiliations

Accurate Early Detection and EGFR Mutation Status Prediction of Lung Cancer Using Plasma cfDNA Coverage Patterns: A Proof-of-Concept Study

Zhixin Bie et al. Biomolecules. .

Abstract

Lung cancer is a major global health concern with a low survival rate, often due to late-stage diagnosis. Liquid biopsy offers a non-invasive approach to cancer detection and monitoring, utilizing various features of circulating cell-free DNA (cfDNA). In this study, we established two models based on cfDNA coverage patterns at the transcription start sites (TSSs) from 6X whole-genome sequencing: an Early Cancer Screening Model and an EGFR mutation status prediction model. The Early Cancer Screening Model showed encouraging prediction ability, especially for early-stage lung cancer. The EGFR mutation status prediction model exhibited high accuracy in distinguishing between EGFR-positive and wild-type cases. Additionally, cfDNA coverage patterns at TSSs also reflect gene expression patterns at the pathway level in lung cancer patients. These findings demonstrate the potential applications of cfDNA coverage patterns at TSSs in early cancer screening and in cancer subtyping.

Keywords: EGFR mutation status prediction; cfDNA; coverage patterns at the transcription start sites; early cancer screening; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The cfDNA transcription start site (TSS) obtained through non-invasive testing can be subjected to Whole Genome Sequencing (WGS) to achieve Early Cancer Screening and the EGFR mutation status prediction for lung cancer. This study involved 196 participants, including 96 lung cancer patients and 100 healthy controls, for model building. Models were based on random forest (RF) model with 10-fold 3 times cross-validation (CV) to avoid overfitting. In addition, 142 participants were included in an external independent test cohort for further model validation. To build the Early Cancer Screening Model, the 196 samples were randomly assigned to the training cohort (n = 138) and the Validation Dataset (n = 58). The external independent test cohort was used to test the generalization ability of the model. Additionally, the feasibility of using cfDNA TSS coverage for predicting EGFR mutations was explored using a subset of patients with clinical information on EGFR. To build the EGFR mutation status prediction model, the 65 samples were randomly assigned to the training cohort (n = 47) and the Validation Dataset (n = 18).
Figure 2
Figure 2
TSS coverage was distinct between cancer and non-cancer samples. (A) Principal Component Analysis (PCA) demonstrated clear group separation between lung cancer and healthy individuals based on TSS coverage values. (B) A volcano plot of differentially expressed genes between lung cancer and healthy individuals, derived from TCGA RNA-seq data, highlighted 1241 genes identified using TSS values, indicating their potential role in lung cancer development. (C) Gene enrichment analysis identified the top 25 pathways associated with differentially expressed TSSs between cancer and non-cancer samples, including pathways specifically related to lung cancer (e.g., “Non-small cell lung cancer” and “ErbB signaling pathway”) and other pathways related to cancer development. (D) Differences in TSS between healthy individuals and lung cancer patients were observed among the filtered 200 features selected for dimensionality reduction in the model, suggesting their potential utility in constructing a lung cancer screening model.
Figure 3
Figure 3
Performances of the robust Early Cancer Screening Model. (A) Receiver Operating Characteristic (ROC) curve demonstrating high performance of the Early Cancer Screening Model in the training and validation sets. (B) Area Under Curve (AUC) values for early-stage (stage I/II) and advanced-stage (stage III/IV) differentiation in the training set. (C) AUC values for early-stage and advanced-stage differentiation in the validation set. (D) Model performance in the external independent test cohort showing good generalization ability with an AUC of 0.891.
Figure 4
Figure 4
TSS coverage can differentiate patients with and without EGFR mutations. (A) 26 pathways related to EGFR from the KEGG database were selected. For each of the 65 patients with EGFR mutation detection, the average normalized TSS coverage of genes involved in these 26 pathways was calculated. Boxplots of path scores based on the normalized TSS coverage mean in these 26 pathways were plotted for patients with and without EGFR mutations, with differences between groups indicated by p-values (*** represents p < 0.001, ** represents p < 0.01, * represents p < 0.05, ns represents not significant). (B) Out of the total 619 KEGG pathways analyzed, 146 (23.6%) demonstrated variances in path scores, with a striking 57.7% (15/26) of EGFR-related pathways showing significant intergroup differences (p < 0.01). (C) Among the non-EGFR mutation pathways, only 29 out of 143 (20.3%) displayed significant differences (p < 0.01). (D) The TSS coverage of all transcripts corresponding to genes in the 26 pathways was calculated, and the TSS with differential expression between groups was selected for heatmap analysis, demonstrating significant differences between patients with and without EGFR mutations. (E) An EGFR mutation status prediction model was built and showed good performance in both the training and validation cohorts.

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Yang Y., Liu H., Chen Y., Xiao N., Zheng Z., Liu H., Wan J. Liquid biopsy on the horizon in immunotherapy of non-small cell lung cancer: Current status, challenges, and perspectives. Cell Death Dis. 2023;14:230. doi: 10.1038/s41419-023-05757-5. - DOI - PMC - PubMed
    1. Liang N., Li B., Jia Z., Wang C., Wu P., Zheng T., Wang Y., Qiu F., Wu Y., Su J. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat. Biomed. Eng. 2021;5:586–599. doi: 10.1038/s41551-021-00746-5. - DOI - PubMed
    1. Esfahani M.S., Hamilton E.G., Mehrmohamadi M., Nabet B.Y., Alig S.K., King D.A., Steen C.B., Macaulay C.W., Schultz A., Nesselbush M.C. Inferring gene expression from cell-free DNA fragmentation profiles. Nat. Biotechnol. 2022;40:585–597. doi: 10.1038/s41587-022-01222-4. - DOI - PMC - PubMed
    1. Raman L., Van der Linden M., Van der Eecken K., Vermaelen K., Demedts I., Surmont V., Himpe U., Dedeurwaerdere F., Ferdinande L., Lievens Y. Shallow whole-genome sequencing of plasma cell-free DNA accurately differentiates small from non-small cell lung carcinoma. Genome Med. 2020;12:35. doi: 10.1186/s13073-020-00735-4. - DOI - PMC - PubMed

MeSH terms