Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 15;9(1):142.
doi: 10.1038/s41598-018-35704-w.

Blood Biochemistry Analysis to Detect Smoking Status and Quantify Accelerated Aging in Smokers

Affiliations

Blood Biochemistry Analysis to Detect Smoking Status and Quantify Accelerated Aging in Smokers

Polina Mamoshina et al. Sci Rep. .

Abstract

There is an association between smoking and cancer, cardiovascular disease and all-cause mortality. However, currently, there are no affordable and informative tests for assessing the effects of smoking on the rate of biological aging. In this study we demonstrate for the first time that smoking status can be predicted using blood biochemistry and cell count results andthe recent advances in artificial intelligence (AI). By employing age-prediction models developed using supervised deep learning techniques, we found that smokers exhibited higher aging rates than nonsmokers, regardless of their cholesterol ratios and fasting glucose levels. We further used those models to quantify the acceleration of biological aging due to tobacco use. Female smokers were predicted to be twice as old as their chronological age compared to nonsmokers, whereas male smokers were predicted to be one and a half times as old as their chronological age compared to nonsmokers. Our findings suggest that deep learning analysis of routine blood tests could complement or even replace the current error-prone method of self-reporting of smoking status and could be expanded to assess the effect of other lifestyle and environmental factors on aging.

PubMed Disclaimer

Conflict of interest statement

Until July 2018 and during work on this project Insilico Medicine (Insilico) was a shareholder in the Canada Cancer and Aging Research Laboratories (CCARL) hence A.Z., A.A., E.P., K.K. and P.M. joint affiliation. As of July 2018, Insilico and CCARL are independent companies engaged in aging and disease research, new affiliations are – A.Z., P.M., E.P., K.K., A.A. are affiliated with Insilico, and O.K., A.K., N.M.S. are affiliated with CCARL.

Figures

Figure 1
Figure 1
Deep learning-based blood-biochemistry clocks accurately predict chronological age. (A) Prediction accuracy of the best-performing model. The model trained on 24 parameters achieved an R2 of 0.57 and an MAE of 5.7 years. (B) The design of the deep learning study that used blood-biochemistry data to predict an individual’s age. Blood samples of nonsmokers were first preprocessed and normalized as previously described. Next, arbitrage ranking based on 320 RF models was applied to facilitate the selection of the most appropriate feature space with maximum samples available. Afterward, missing values were reconstructed using an autoregressive model with a view towards increasing the training sets, and the resulting feature sets were used to train and test DNNs for predicting patient age and smoking status. (C) Feature importance plot. Fasting glucose, sex, and RDW exhibited higher relative importance scores than other features used in model training. Note High-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol. RDW for red blood cell distribution width, RBC for red blood cell counts, MCV for mean corpuscular volume, ALT for alanine transaminase, MCHC for mean corpuscular hemoglobin.
Figure 2
Figure 2
Deep learning-based hematological clocks demonstrated accelerated aging rates in smokers and revealed patient smoking status. (A) The prediction accuracy of the best-performing model trained on feature space extended with smoking status. The model, trained on 24 parameters, achieved an R2 of 0.60 and an MAE of 5.42 years (B) The log2 aging ratio of smokers to nonsmokers by age and sex groups for the best-performing model. Smokers demonstrated a higher aging rate regardless of sex. However, these differences plateaued after 55 years of age. A log2 aging ratio of 1 means the sample was predicted to be twice as old as a chronological age, and a log2 aging ratio of −1 means the sample was predicted to be half as old as a chronological age. (C) The most important features in the classification of smoking status selected by the PFI method. HDL cholesterol, sex, and hemoglobin exhibited higher relative importance scores than other features used in model training. (D) The model trained on 23 parameters achieved an F1 score of 0.67 and an accuracy of 0.84. Note High-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol. RDW for red blood cell distribution width, RBC for red blood cell counts, MCV for mean corpuscular volume, ALT for alanine transaminase, MCHC for mean corpuscular hemoglobin.
Figure 3
Figure 3
Confusion matrices. (A) Confusion matrices for the best-performing smoking status classifier, trained on 23 features, in number of samples (left) and percentage (right). Row values show predicted smoking status, and columns show actual smoking status. Most of the error smoking predictions occurred in individuals older than 55 years. (B) Confusion matrices for age prediction by age groups for the best model, trained on 24 parameters, in number of samples (left) and percentage (right). Row values show actual chronological age group, and columns show predicted age group. Smokers of age groups < 30 and 30–40 were mostly predicted to be older.
Figure 4
Figure 4
Log2 aging ratios for the four groups Cholesterol ratio > 4 and Fasting Glucose > 5 mmol/L, Cholesterol ratio > 4 and Fasting Glucose <= 5 mmol/L, Cholesterol ratio <= 4 and Fasting Glucose > 5 mmol/L, and Cholesterol ratio > 4 and Fasting Glucose > 5 mmol/L. Smokers of age groups < 30 and 31–40 are predicted older regardless their Cholesterol ratio and Fasting Glucose level. Log2 aging ratio of 1 means that sample is predicted two fold older than a chronological age and log2 aging ratio of −1 means sample is predicted half as old. Bars indicate standard deviation.

References

    1. Zhavoronkov A, Litovchenko M. Biomedical Progress Rates as New Parameters for Models of Economic Growth in Developed Countries. Int. J. Environ. Res. Public Health. 2013;10(11):5936–5952. doi: 10.3390/ijerph10115936. - DOI - PMC - PubMed
    1. Xia X, et al. Molecular and Phenotypic Biomarkers of Aging. F1000Research. 2017;6:860. doi: 10.12688/f1000research.10692.1. - DOI - PMC - PubMed
    1. Jylhava J, Pedersen N, Hagg S. Biological Age Predictors. EBioMedicine. 2017;21:29–36. doi: 10.1016/j.ebiom.2017.03.046. - DOI - PMC - PubMed
    1. Ozerov IV, et al. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development. Nat. Commun. 2016;7:13427. doi: 10.1038/ncomms13427. - DOI - PMC - PubMed
    1. Aliper AM, et al. Signaling pathway activation drift during aging: Hutchinson-Gilford Progeria Syndrome fibroblasts are comparable to normal middle-age and old-age cells. Aging. 2015;7(1):26–37. doi: 10.18632/aging.100717. - DOI - PMC - PubMed

Publication types