Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension
- PMID: 23082036
- PMCID: PMC3471246
- DOI: 10.1080/01621459.2012.656014
Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension
Abstract
Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods.
Figures
Similar articles
-
Variable selection for ultra-high dimensional quantile regression with missing data and measurement error.Stat Methods Med Res. 2021 Jan;30(1):129-150. doi: 10.1177/0962280220941533. Epub 2020 Aug 3. Stat Methods Med Res. 2021. PMID: 32746735
-
Regularized Quantile Regression and Robust Feature Screening for Single Index Models.Stat Sin. 2016 Jan;26(1):69-95. doi: 10.5705/ss.2014.049. Stat Sin. 2016. PMID: 26941542 Free PMC article.
-
CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.Ann Stat. 2013 Oct 1;41(5):2505-2536. doi: 10.1214/13-AOS1159. Ann Stat. 2013. PMID: 24948843 Free PMC article.
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
An overview on parametric quantile regression models and their computational implementation with applications to biomedical problems including COVID-19 data.Comput Methods Programs Biomed. 2022 Jun;221:106816. doi: 10.1016/j.cmpb.2022.106816. Epub 2022 Apr 25. Comput Methods Programs Biomed. 2022. PMID: 35580528 Review.
Cited by
-
The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies.Stat Med. 2024 Nov 20;43(26):4928-4983. doi: 10.1002/sim.10196. Epub 2024 Sep 11. Stat Med. 2024. PMID: 39260448
-
The lasso for high dimensional regression with a possible change point.J R Stat Soc Series B Stat Methodol. 2016 Jan;78(1):193-210. doi: 10.1111/rssb.12108. Epub 2015 Feb 15. J R Stat Soc Series B Stat Methodol. 2016. PMID: 27656104 Free PMC article.
-
Covariate Information Number for Feature Screening in Ultrahigh-Dimensional Supervised Problems.J Am Stat Assoc. 2022;117(539):1516-1529. doi: 10.1080/01621459.2020.1864380. Epub 2021 Feb 10. J Am Stat Assoc. 2022. PMID: 36172297 Free PMC article.
-
Quantile Regression Forests to Identify Determinants of Neighborhood Stroke Prevalence in 500 Cities in the USA: Implications for Neighborhoods with High Prevalence.J Urban Health. 2021 Apr;98(2):259-270. doi: 10.1007/s11524-020-00478-y. J Urban Health. 2021. PMID: 32888155 Free PMC article.
-
Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic.J Econom. 2023 Jul;235(1):166-179. doi: 10.1016/j.jeconom.2022.03.001. Epub 2022 Apr 8. J Econom. 2023. PMID: 36568314 Free PMC article.
References
-
- An LTH, Tao PD. The DC (Difference of Convex Functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research. 2005;133:23–46.
-
- Bai Z, Wu Y. Limiting behavior of M-estimators of regression coefficients in high dimensional linear models, I. Scale-dependent case. Journal of Multivariate Analysis. 1994;51:211–239.
-
- Belloni A, Chernozhukov V. L1-Penalized quantile regression in high-dimensional sparse models. The Annals of Statistics. 2011;39:82–130.
-
- Bertsekas DP. Nonlinear programming. 3. Athena Scientific; Belmont, Massachusetts: 2008.
-
- Candes EJ, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics. 2007;35:2313–2351.
Grants and funding
LinkOut - more resources
Full Text Sources