Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2021 Aug 20;12(1):5060.
doi: 10.1038/s41467-021-24994-w.

Detection and characterization of lung cancer using cell-free DNA fragmentomes

Affiliations
Observational Study

Detection and characterization of lung cancer using cell-free DNA fragmentomes

Dimitrios Mathios et al. Nat Commun. .

Abstract

Non-invasive approaches for cell-free DNA (cfDNA) assessment provide an opportunity for cancer detection and intervention. Here, we use a machine learning model for detecting tumor-derived cfDNA through genome-wide analyses of cfDNA fragmentation in a prospective study of 365 individuals at risk for lung cancer. We validate the cancer detection model using an independent cohort of 385 non-cancer individuals and 46 lung cancer patients. Combining fragmentation features, clinical risk factors, and CEA levels, followed by CT imaging, detected 94% of patients with cancer across stages and subtypes, including 91% of stage I/II and 96% of stage III/IV, at 80% specificity. Genome-wide fragmentation profiles across ~13,000 ASCL1 transcription factor binding sites distinguished individuals with small cell lung cancer from those with non-small cell lung cancer with high accuracy (AUC = 0.98). A higher fragmentation score represented an independent prognostic indicator of survival. This approach provides a facile avenue for non-invasive detection of lung cancer.

PubMed Disclaimer

Conflict of interest statement

D.M., S.C., J.P., A.L., V.A., R.B.S., and V.E.V. are inventors on patent applications submitted by Johns Hopkins University related to cell-free DNA for cancer detection. S.C., J.P., A.L., V.A., and R.B.S. are founders of Delfi Diagnostics, and V.A. and R.B.S are consultants for this organization. V.E.V. is a founder of Delfi Diagnostics and Personal Genome Diagnostics, serves on the Board of Directors and as a consultant for both organizations, and owns Delfi Diagnostics and Personal Genome Diagnostics stock, which are subject to certain restrictions under university policy. In addition, Johns Hopkins University owns equity in Delfi Diagnostics and Personal Genome Diagnostics. The technology used in the study described in this publication has been licensed to one or more entities. Under the terms of these license agreements, the University and inventors are entitled to fees and royalty distributions. V.E.V. is an advisor to Bristol-Myers Squibb, Genentech, and Takeda Pharmaceuticals. Within the last five years, V.E.V. has been an advisor to Merck and Ignyta. These arrangements have been reviewed and approved by the Johns Hopkins University in accordance with its conflict of interest policies.

Figures

Fig. 1
Fig. 1. Schematic of overall approach.
a Schematic representation of DNA fragmentation and release from apoptotic lung cancer cells and WBCs. Nucleosomal DNA with variable length of linker DNA is preserved in the circulation with cancer cell cfDNA fragments having a more aberrant profile compared to the cfDNA fragments arising from the WBCs. Mapping of the cfDNA fragments along the genome reveals distinct patterns in cancer patients compared to non-cancer individuals. b Outline of the DELFI approach for early detection of lung cancer. 365 patients from the LUCAS diagnostic cohort were analyzed to derive genome-wide fragmentation profiles that were used to train and evaluate the diagnostic performance in this cohort using a cross-validated machine learning model. A fixed model was used to validate the performance in an independent cohort of 46 lung cancer patients and 385 non-cancer individuals. QC, quality control.
Fig. 2
Fig. 2. Cell-free DNA fragmentation profiles of lung cancer patients and non-cancer individuals.
a The ratio of short to long cfDNA fragments in 5 Mb bins across the genome was evaluated in plasma samples of lung cancer and non-cancer individuals from the LUCAS cohort. The non-cancer individuals had similar fragmentation profiles while lung cancer patients exhibited significant variation. b Heatmap representation of the deviation of cfDNA fragmentation features across the genome for patients with lung cancer or non-cancer individuals compared to the mean of non-cancer individuals. Overall DELFI score and clinical characteristics are indicated to the left of the fragmentation deviation heatmap. c Heatmap representation of principal component eigenvalues of the fragmentation profile features. The relative importance of the features are shown at the top (fragmentation changes) and right (chromosomal arm changes) of the heatmap, with colors indicating increases (red) or decreases (blue) of the coefficient of cancer risk. TCGA derived observations of chromosomal arm gains (red) and losses (blue) in lung adenocarcinoma (LUAD) (n = 518) and squamous cell cancers (LUSC) (n = 501) are indicated at the right margin. Agreement between the color of the variable importance bar in LUCAS and the TCGA copy number data indicates a correspondence between higher cancer risk due to decreases (blue) or increases (red) in chromosomal arm level representation in LUCAS and copy number amplifications (red) and copy number deletions (blue) in TCGA, respectively.
Fig. 3
Fig. 3. Performance of DELFI analyses for lung cancer patients and non-cancer individuals.
a DELFI score distribution across non-cancer individuals and cancer patients, stratified by stage and histology groups in the LUCAS cohort. The box-plot shows the median DELFI score and the inter-quartile range with the individual sample values overlaid as dots. The non-cancer cases with or without benign lesions have a lower DELFI score compared to cancer cases and there is a stepwise increase in DELFI score by stage. The highest median DELFI score is observed in SCLC cases. Green curves indicate all individuals in the LUCAS cohort, orange represents patients without prior history of cancer, and blue indicates patients without prior history of cancer, age 50–80, and with ≥20 pack-year smoking history. The center line in the boxplots represents the median, the upper limit of the boxplots represents the third quantile (75th percentile), the lower limit of the boxplots represents the first quantile (25th percentile), the upper whiskers is the maximum value of the data that is within 1.5 times the interquartile range over the 75th percentile, and the lower whisker is the minimum value of the data that is within 1.5 times the interquartile range under the 25th percentile. b ROC analyses of the overall LUCAS cohort as well as by stage and histology. The dotted vertical line in the ROC figures represents an 80% specificity as a decision boundary. c Analysis of a DELFI fixed model and score cutoff of 0.344 determined from the LUCAS cohort was applied in the validation cohort. The performance of this classifier in the independent cohort was similar to LUCAS in both specificity (left) and sensitivity (right) across all tumor stages. The number of samples in the training and validation sets are indicated in the labels of the horizontal axis. The intervals presented reflect a 90% confidence interval. Additional analyses at other specificities are indicated in Supplementary Fig. 6.
Fig. 4
Fig. 4. Relationship of size and invasiveness of lung cancer with DELFI score.
a DELFI scores of non-metastatic patients with lung cancer categorized by T stage or N stage in the LUCAS cohort. We observe an incremental increase of the DELFI score by T stage from T1 to T4 (p < 0.01, Kruskal–Wallis, df = 3, two-sided) (n: T1 = 14, T2 = 12, T3 = 4, T4 = 26). Lung cancer patients without involvement of lymph nodes had a significantly lower DELFI scores compared to patients with nodal spread (Wilcoxon rank sum test, p < 0.001, two-sided) (n: N0 = 27, N 1–3 = 29). b The stepwise increase in DELFI score by T and N stage was maintained when considering both T and N stages in each patient (Kruskal–Wallis, df = 6, p < 0.01, two-sided) (n: T1N0 = 10, T1N(1–3) = 4, T2N0 = 6, T2N(1–3) = 6, T3N(0–3) = 4, T4N0 = 9, T4N(1–3) = 17) c Patients with primary lung cancer were stratified in two groups based on a DELFI cutoff of 0.5 (n = 93). Patients with a DELFI score > 0.5 (red) had a significantly worse cancer-specific survival compared to patients with DELFI score < 0.5 (blue) (P = 0.003, Log-rank test, two-sided). The center line in the boxplots represents the median, the upper limit of the boxplots represents the third quantile (75th percentile), the lower limit of the boxplots represents the first quantile (25th percentile), the upper whiskers is the maximum value of the data that is within 1.5 times the interquartile range over the 75th percentile, and the lower whisker is the minimum value of the data that is within 1.5 times the interquartile range under the 25th percentile.
Fig. 5
Fig. 5. Genome-wide fragmentation profiles can distinguish SCLC from NSCLC.
a Expression of ASCL1 transcription factor in TCGA RNA-seq analyses of SCLCs (n = 79) is high compared to NSCLC (n = 1046) or WBC (755) samples. TPM transcripts per million. The center line in the boxplots represents the median, the upper limit of the boxplots represents the third quantile (75th percentile), the lower limit of the boxplots represents the first quantile (25th percentile), the upper whiskers is the maximum value of the data that is within 1.5 times the interquartile range over the 75th percentile, and the lower whisker is the minimum value of the data that is within 1.5 times the interquartile range under the 25th percentile. b Unsupervised clustering analyses of gene expression in TCGA lung cancer cohorts show that genes with ASCL1 binding sites are differentially expressed between SCLCs and NSCLCs. Genome-wide cfDNA fragmentation analyses at ASCL1 binding sites in LUCAS cohort patients reveal a decrease in coverage near transcription factor binding sites of SCLC patients compared to non-cancer individuals (c) or DELFI positive patients with SCLC compared to other individuals (e). These molecular features can distinguish SCLC patients (n = 11) from non-cancer individuals (n = 158) (d, AUC = 0.92) and DELFI positive SCLC patients (n = 10) from NSCLC patients and others (n = 115) (f, AUC = 0.98), with high accuracy.
Fig. 6
Fig. 6. Modeling the implementation of DELFI in lung cancer screening.
a Schematic representation of current clinical practice for lung cancer screening (top) and the proposed approach in combination with the DELFI test (bottom). In the combined approach, individuals at high-risk for lung cancer would undergo an annual blood draw that would be assessed using the DELFI test, and individuals with a positive result would subsequently undergo an LDCT scan for detection of lung cancer, while individuals with a DELFI negative result would repeat their screening annually. b Sensitivity of DELFI alone or DELFI followed by LDCT for lung cancer detection were compared holding specificity for the single analysis or the joint analysis at 80%. For these analyses, we considered individuals with lung cancer as those detected at baseline with LDCT, although three individuals were identified with lung cancer at a repeat LDCT within a year. The number of individuals in the LUCAS cohort are as follows: stage I n = 15, II n = 7, III n = 35, IV n = 72; and individuals in the cohort with lung adenocarcinoma comprised stage I n = 8, II n = 3, III n = 14, IV n = 37. The points colored green refer to analyses of all patients in the LUCAS cohort, whereas orange points indicate analyses of individuals without a prior history of cancer. The number of individuals are indicated schematically by the size of the dots and in Supplementary Table 1. The error bars represent the 90% confidence interval. c We modeled the uncertainty of sensitivity and specificity of LDCT alone as well as DELFI followed by LDCT for screening in a theoretical population of 100,000 high-risk individuals. Predictive distributions for the number of lung cancers detected (d), accuracy (e), rate of unnecessary procedures (f), and positive predictive values (g) among these individuals incorporated variation in both the prevalence of lung cancer and adherence to image- and blood-based screening. The center line in the boxplots represents the median, the upper limit of the boxplots represents the third quantile (75th percentile), the lower limit of the boxplots represents the first quantile (25th percentile), the upper whiskers is the maximum value of the data that is within 1.5 times the interquartile range over the 75th percentile, and the lower whisker is the minimum value of the data that is within 1.5 times the interquartile range under the 25th percentile.

Comment in

References

    1. Ferlay J, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer. 2019;144:1941–1953. doi: 10.1002/ijc.31937. - DOI - PubMed
    1. De Angelis R, et al. Cancer survival in Europe 1999-2007 by country and age: results of EUROCARE–5-a population-based study. Lancet Oncol. 2014;15:23–34. doi: 10.1016/S1470-2045(13)70546-1. - DOI - PubMed
    1. de Groot PM, Wu CC, Carter BW, Munden RF. The epidemiology of lung cancer. Transl. Lung Cancer Res. 2018;7:220–233. doi: 10.21037/tlcr.2018.05.06. - DOI - PMC - PubMed
    1. de Koning HJ, et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N. Engl. J. Med. 2020;382:503–513. doi: 10.1056/NEJMoa1911793. - DOI - PubMed
    1. National Lung Screening Trial Research T, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 2011;365:395–409. doi: 10.1056/NEJMoa1102873. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances