A novel high-dimensional model for identifying regional DNA methylation QTLs
- PMID: 41139289
- PMCID: PMC12554007
- DOI: 10.1093/biostatistics/kxaf032
A novel high-dimensional model for identifying regional DNA methylation QTLs
Abstract
Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.
Keywords: methylation QTLs; proximal gradient descent; smoothness control; variable selection; varying coefficient model.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures
References
-
- Affinito O et al. 2020. Nucleotide distance influences co-methylation between nearby CpG sites. Genomics. 112:144–150. - PubMed
-
- Barber RF, Reimherr M, Schill T. 2017. The function-on-scalar LASSO with applications to longitudinal GWAS. Electron J Stat. 11:1351–1389.
-
- Chouldechova A, Hastie T. 2015. Generalized additive model selection [preprint]. arXiv, arXiv:1506.03850.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
