Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;54(4):492-498.
doi: 10.1038/s41588-022-01035-w. Epub 2022 Apr 11.

Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking

Affiliations

Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking

Zhenqiu Huang et al. Nat Genet. 2022 Apr.

Abstract

Although lung cancer risk among smokers is dependent on smoking dose, it remains unknown if this increased risk reflects an increased rate of somatic mutation accumulation in normal lung cells. Here, we applied single-cell whole-genome sequencing of proximal bronchial basal cells from 33 participants aged between 11 and 86 years with smoking histories varying from never-smoking to 116 pack-years. We found an increase in the frequency of single-nucleotide variants and small insertions and deletions with chronological age in never-smokers, with mutation frequencies significantly elevated among smokers. When plotted against smoking pack-years, mutations followed the linear increase in cancer risk until about 23 pack-years, after which no further increase in mutation frequency was observed, pointing toward individual selection for mutation avoidance. Known lung cancer-defined mutation signatures tracked with both age and smoking. No significant enrichment for somatic mutations in lung cancer driver genes was observed.

PubMed Disclaimer

Conflict of interest statement

Competing interests

A.Y.M., X.D., and J.V. are cofounders of SingulOmics Corp. The remaining authors declare no other competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Mutation frequency and correction deviation error
SNV frequency of never-smokers versus age. Each data point indicates the mutation frequency per nucleus from each individual, with color intensity indicating relative standard error value (see Methods). The four cells of highest mutational burden were plotted separately with each data point representing median value with standard deviation errors.
Extended Data Fig. 2
Extended Data Fig. 2. Distribution of shared mutations in subject 1320
a, Stacked bar plot showing the proportional contribution of shared SNVs between all sequenced 3–8 nuclei per subject. b, Upset plot showing the distribution of shared SNVs in six nuclei from subject 1320 (lower part). The bar chart (upper part) represents the number of SNVs shared by each nucleus combination.
Extended Data Fig. 3
Extended Data Fig. 3. An a prori semi-parametric B-spline model to test the non-linearity between mutation frequency and smoking pack-years
Each data point indicates the SNV frequency of nuclei of individuals. The spline fit evaluated at the average age and the average of random effects, with the 95% confidence interval are shown by the gray line, with the piece-wise linear model fit as the blue line. P value for the spline model is 0.0043 compared to the linear model, and 0.0034 when compared to the null model (see Methods).
Extended Data Fig. 4
Extended Data Fig. 4. INDEL frequency and smoking dose
a, INDEL frequency versus smoking pack-years across all individuals (n=33). Each dot indicates the median value and the minimal and maximal range of INDEL frequency of individuals. b, INDEL frequency of different group of individuals according to the smoking pack-years, with boxes indicating median number and interquartile range of the never (n=14), light (n=6), moderate (n=6), and heavy (n=7) smoking group, respectively.
Extended Data Fig. 5
Extended Data Fig. 5. Effects of smoking cessation on mutation frequency
Median number of SNV and INDEL frequency among former smokers (n=7) and current smokers (n=12). a, each data point indicates the median value and the minimal and maximal range of SNV frequency of 3–8 nuclei per subject. b, each data point indicates the median value and the minimal and maximal range of INDEL frequency of 3–8 nuclei per subject. P values were obtained by likelihood ratio tests using negative binomial mixed effect model.
Extended Data Fig. 6
Extended Data Fig. 6. SNV frequency in the lung functional genome using scRNA-seq human lung data instead of GTEX
Each data point represents the number of mutations per nucleus of in functional genome (x axis) and whole genome (y axis) of all subjects colored by smoking status.
Extended Data Fig. 7
Extended Data Fig. 7. Cancer driver mutations
a, Distribution of driver gene mutations in single nuclei of subjects, with number of mutations and smoking status indicated by colors. b, Total number of single nuclei with unique mutations found in pan-cancer driver genes and number of unique mutations in pan-cancer driver genes across the sample set (n = 134), 22 of 85 driver genes shown (Supplementary Table 5).
Extended Data Fig. 8
Extended Data Fig. 8. Mutational signatures and smoking
a, Mutation spectra of four novel signatures identified among never-smokers and smokers. The six substitution types are shown across the top. Within each substitution type, the trinucleotide context is shown as four sets of four bars, grouped by whether an A, C, G or T, respectively, is 5′ or 3′ to the mutated base. b-f, Absolute number of major signatures discovered from never-smokers (n=14) and smokers (n=19). Each dot indicates the median number of SNV frequency of each individual. Boxes indicate median values and interquartile ranges among each group. The quoted P values were obtained by likelihood ratio tests using linear mixed effect models. g, APOBEC signatures relative contribution versus SNV frequency of nuclei of never-smokers. Each data point represents a nucleus.
Extended Data Fig. 9
Extended Data Fig. 9. The INDEL mutation signature analysis
a, Mutation spectra of INDEL in single nuclei from never-smokers (n=14) and smokers (n=19). The contributions of different types of INDELs are shown, grouped by whether variants are deletions or insertions; the size of the event; whether they occur at repeat units; and the sequence content of the INDEL. b, Stacked bar plot showing the proportional contribution of mutational signatures to INDELs across all nuclei (n=134) measured from never-smokers and smokers, four signatures (N1, ID1, ID3, ID4) were extracted by HDP.
Extended Data Fig. 10
Extended Data Fig. 10. Germline genetic variants associated with solid cancers
A heat map showing 6 germline variants associated to solid cancers found in each subject per column, with the presence and absence colored. Variant IDs at the left of each row of the heatmap represent 6 different solid cancer associated single nucleotide polymorphisms found through Clinvar (Supplementary Table S7).
Fig. 1 |
Fig. 1 |. Mutation accumulation in PBBCs with age in never-smokers.
a, Schematic representation of the isolation, processing and analysis of PBBCs from human lung. b-c, SNV and INDEL frequency of never-smokers versus age. Each data point indicates the mutation frequency per nucleus from each individual. d-e, SNV and INDEL frequency from never-smokers (n=14) versus age. Each data point indicates the median value and the minimal and maximal range of mutation frequency of 3–5 nuclei from each individual. P values were obtained by likelihood ratio tests using negative binomial mixed effect models.
Fig. 2 |
Fig. 2 |. Mutation accumulation in PBBCs of smokers.
a-b, Elevated SNV (INDEL) frequency in relation to age among smokers (n=19). Each data point indicates the median value and the minimal and maximal range of mutation frequency of 3–8 nuclei per subject. P values were obtained by a likelihood ratio tests using negative binomial mixed effect models. c, SNV frequency versus smoking pack-years across all individuals (n=33). Each data point indicates the median value and the minimal and maximal range of mutation frequency of 3–8 nuclei per subject. Subject 1320 with its high clonality of mutations is marked and gray dash line indicates the change point of piece-wise linear regression without including 1320. d, SNV frequency of different groups of individuals according to the smoking pack-years, with boxes indicating median values and interquartile ranges of the never (n=14), light (n=6), moderate (n=6), and heavy (n=7) smoking group, respectively. e, SNV frequency in the functional genome and genome overall of transcribed lung exome based on human lung dataset of GTEx ,. Each data point represents the median number of SNVs per subject in the functional genome (x axis) and whole genome (y axis) of never-smokers (orange) and smokers (black).
Fig. 3 |
Fig. 3 |. Cancer driver mutations in normal PBBC nuclei.
a, Total number of nuclei with mutations and number of unique mutations in lung cancer driver genes across the sample set (n = 134). b, Mutation frequency in cancer driver genes and randomly chosen genes of the same size. Red diamonds indicate the genome coverage normalized mutation frequency of driver genes with the cancer set indicated above. Boxes indicate median values and interquartile ranges of mutation rate results of a randomly selected identical number of genes for corresponding cancer driver gene set for 200 repeats (n=200).
Fig. 4 |
Fig. 4 |. Mutational signatures and smoking.
a, (Left) Mutation spectra of single nuclei from never-smokers and smokers and (Right) stacked bar plot showing the proportional contribution of mutational signatures to SNVs across all nuclei measured from never-smokers and smokers, extracted using a hierarchical Dirichlet process (HDP). b, Median number of SNVs attributed to SBS4 signatures versus the age of individuals. Each data point indicates the median value and the minimal and maximal range of 3–8 nuclei from each individual attributed to SBS4 colored by smoking status. P values were obtained by likelihood ratio tests using linear mixed effect models (see Methods). c, Median number of SNVs attributed to SBS5 signatures versus the age of the individuals. Each data point indicates the median value and the minimal and maximal range of 3–8 nuclei from each individual attributed to SBS5 colored by smoking status. P values were obtained by likelihood ratio tests using linear mixed effect models (see Methods).

References

    1. Flanders WD, Lally CA, Zhu B-P, Henley SJ & Thun MJ Lung Cancer Mortality in Relation to Age, Duration of Smoking, and Daily Cigarette Consumption. Cancer Research 63(2003). - PubMed
    1. Thurston SW, Liu G, Miller DP & Christiani DC Modeling lung cancer risk in case-control studies using a new dose metric of smoking. Cancer Epidemiology Biomarkers and Prevention 14, 2296–2302 (2005). - PMC - PubMed
    1. Alberg AJ, Brock MV, Ford JG, Samet JM & Spivack SD Epidemiology of lung cancer: Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e1S–e29S (2013). - PMC - PubMed
    1. Spivack SD, Fasco MJ, Walker VE & Kaminsky LS The molecular epidemiology of lung cancer. Crit Rev Toxicol 27, 319–65 (1997). - PubMed
    1. Kucab JE et al. A Compendium of Mutational Signatures of Environmental Agents. Cell 177, 821–836.e16 (2019). - PMC - PubMed

Methods-only References:

    1. Yoshida K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature (2020). - PMC - PubMed
    1. Brazhnik K. et al. Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver. Science Advances 6(2020). - PMC - PubMed
    1. Lodato MA et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018). - PMC - PubMed
    1. Zhang L. et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proceedings of the National Academy of Sciences of the United States of America 116, 9014–9019 (2019). - PMC - PubMed
    1. Remen T, Pintos J, Abrahamowicz M. & Siemiatycki J. Risk of lung cancer in relation to various metrics of smoking history: A case-control study in Montreal 11 Medical and Health Sciences 1117 Public Health and Health Services. BMC Cancer 18, 1–12 (2018). - PMC - PubMed

Publication types