Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov;34(11):1055-1074.
doi: 10.1007/s10654-019-00555-w. Epub 2019 Sep 7.

Validated inference of smoking habits from blood with a finite DNA methylation marker set

Affiliations

Validated inference of smoking habits from blood with a finite DNA methylation marker set

Silvana C E Maas et al. Eur J Epidemiol. 2019 Nov.

Abstract

Inferring a person's smoking habit and history from blood is relevant for complementing or replacing self-reports in epidemiological and public health research, and for forensic applications. However, a finite DNA methylation marker set and a validated statistical model based on a large dataset are not yet available. Employing 14 epigenome-wide association studies for marker discovery, and using data from six population-based cohorts (N = 3764) for model building, we identified 13 CpGs most suitable for inferring smoking versus non-smoking status from blood with a cumulative Area Under the Curve (AUC) of 0.901. Internal fivefold cross-validation yielded an average AUC of 0.897 ± 0.137, while external model validation in an independent population-based cohort (N = 1608) achieved an AUC of 0.911. These 13 CpGs also provided accurate inference of current (average AUCcrossvalidation 0.925 ± 0.021, AUCexternalvalidation0.914), former (0.766 ± 0.023, 0.699) and never smoking (0.830 ± 0.019, 0.781) status, allowed inferring pack-years in current smokers (10 pack-years 0.800 ± 0.068, 0.796; 15 pack-years 0.767 ± 0.102, 0.752) and inferring smoking cessation time in former smokers (5 years 0.774 ± 0.024, 0.760; 10 years 0.766 ± 0.033, 0.764; 15 years 0.767 ± 0.020, 0.754). Model application to children revealed highly accurate inference of the true non-smoking status (6 years of age: accuracy 0.994, N = 355; 10 years: 0.994, N = 309), suggesting prenatal and passive smoking exposure having no impact on model applications in adults. The finite set of DNA methylation markers allow accurate inference of smoking habit, with comparable accuracy as plasma cotinine use, and smoking history from blood, which we envision becoming useful in epidemiology and public health research, and in medical and forensic applications.

Keywords: DNA methylation; Epidemiology; Epigenetics; Forensics; Smoking inference.

PubMed Disclaimer

Conflict of interest statement

H.J. Grabe has received funding from Fresenius Medical Care and speaker’s honoraria as well as travel funds from Fresenius Medical Care, Neuraxpharm and Janssen-Cilag. Other than that, the authors declared no conflict of interest.

Figures

Fig. 1
Fig. 1
DNA methylation β-value differences between smokers and never-smokers for the top 20 smoking-associated CpGs. Previously reported differences in β-values in mean or median (depending on availability per EWAS) between smokers and never-smokers (¤ or non-smokers, when non-smoking data was available) for the selected 20 top-associated CpGs obtained from the 14 reviewed EWASs on smoking habits that did not include samples used here for model building
Fig. 2
Fig. 2
Cumulative AUC profile for smoking habit inference from blood based on the top 20 CpGs. The 20 CpGs were selected from previous EWASs on smoking habits (see Fig. 1) and were tested in the model-building set (N = 3764). Presented is the cumulative contribution of each of the selected 20 CpGs to the model-based smoking habit inference, shown as the AUC plotted against the number of CpGs included in the binary logistic regression model. In the model selection process, first all CpGs were included, and using backward elimination procedures, those with the lowest z-statistic per model were removed one by one. After 13 CpGs, the AUC plateaus; therefore, and by considering the results from Chi squared testing, these 13 CpGs were used for further analyses
Fig. 3
Fig. 3
Inferred probability of being a smoker versus the percentage of correctly inferred smoking habits. Histogram of predicted probabilities in our model building dataset (N = 3764), probabilities determined using the 13 CpGs included in the final prediction model. The y-axis presents the number of individuals for whom the predicted probability of being a smoker was within the given probability range (x-axis). The red dots present the percentage of individuals in each probability bin that were accurately inferred using a > 0.5 probability threshold for being a smoker

Similar articles

Cited by

References

    1. Lee KW, Pausova Z. Cigarette smoking and DNA methylation. Front Genet. 2013;4:132. - PMC - PubMed
    1. Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. Tobacco-smoking-related differential DNA methylation: 27 K discovery and replication. Am J Hum Genet. 2011;88(4):450–457. - PMC - PubMed
    1. Mortusewicz O, Schermelleh L, Walter J, Cardoso MC, Leonhardt H. Recruitment of DNA methyltransferase I to DNA repair sites. Proc Natl Acad Sci USA. 2005;102(25):8905–8909. - PMC - PubMed
    1. Cuozzo C, Porcellini A, Angrisano T, Morano A, Lee B, Di Pardo A, et al. DNA damage, homology-directed repair, and DNA methylation. PLoS Genet. 2007;3(7):e110. - PMC - PubMed
    1. Satta R, Maloku E, Zhubi A, Pibiri F, Hajos M, Costa E, et al. Nicotine decreases DNA methyltransferase 1 expression and glutamic acid decarboxylase 67 promoter methylation in GABAergic interneurons. Proc Natl Acad Sci USA. 2008;105(42):16356–16361. - PMC - PubMed

LinkOut - more resources