. 2024 Jun 28;131(12):2058-2067.

doi: 10.1017/S0007114524000795. Epub 2024 Apr 12.

Enhancing selection of alcohol consumption-associated genes by random forest

Chenglin Lyu^{1

2}, Roby Joehanes³, Tianxiao Huan³, Daniel Levy³, Yi Li¹, Mengyao Wang¹, Xue Liu¹, Chunyu Liu^#¹, Jiantao Ma^#⁴

Affiliations

¹ Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA.
² Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA02118, USA.
³ Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA01702, USA.
⁴ Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA02111, USA.

^# Contributed equally.

PMID: 38606596
PMCID: PMC11216877
DOI: 10.1017/S0007114524000795

Enhancing selection of alcohol consumption-associated genes by random forest

Chenglin Lyu et al. Br J Nutr. 2024.

. 2024 Jun 28;131(12):2058-2067.

doi: 10.1017/S0007114524000795. Epub 2024 Apr 12.

Authors

Chenglin Lyu^{1

2}, Roby Joehanes³, Tianxiao Huan³, Daniel Levy³, Yi Li¹, Mengyao Wang¹, Xue Liu¹, Chunyu Liu^#¹, Jiantao Ma^#⁴

Affiliations

¹ Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA.
² Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA02118, USA.
³ Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA01702, USA.
⁴ Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA02111, USA.

^# Contributed equally.

PMID: 38606596
PMCID: PMC11216877
DOI: 10.1017/S0007114524000795

Abstract

Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.

Keywords: Alcohol consumption; Boruta; CVD; Gene expression; Machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: The authors declare no conflicts of interest.

Figures

**Figure 2.. ROC of selected predictors.**
1) Boruta method was based on the 25 Boruta method selected transcripts; 2) 1,985 transcripts and 3) 25 transcripts were from alcohol-gene expression analyses using conventional linear regression (reference 9); 4) 144 CpGs was from meta-analysis of alcohol associated DNA methylation markers (reference 21); 5) Combined predictors from sets 1, 3, and 4.

See this image and copyright information in PMC

References

1. Emanuele NV, Swade TF, Emanuele MA. Consequences of alcohol use in diabetics. Alcohol Health Res World. 1998;22(3):211–9. - PMC - PubMed
1. Chait A, Mancini M, February AW, et al. Clinical and metabolic study of alcoholic hyperlipidaemia. Lancet. 1972;2(7767):62–4. - PubMed
1. Collaborators GBDA. Alcohol use and burden for 195 countries and territories, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2018;392(10152):1015–35. - PMC - PubMed
1. Chikritzhs TN, Naimi TS, Stockwell TR, et al. Mendelian randomisation meta-analysis sheds doubt on protective associations between ‘moderate’ alcohol consumption and coronary heart disease. Evid Based Med. 2015;20(1):38. - PubMed
1. Stockwell T, Zhao J, Panwar S, et al. Do “Moderate” Drinkers Have Reduced Mortality Risk? A Systematic Review and Meta-Analysis of Alcohol Consumption and All-Cause Mortality. J Stud Alcohol Drugs. 2016;77(2):185–98. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing selection of alcohol consumption-associated genes by random forest

Affiliations

Enhancing selection of alcohol consumption-associated genes by random forest

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical