Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya¹, Yingying Fan², Lan Gao^{1

2}, Jinchi Lv², Patrick Vossler², Jingbo Wang²

Affiliations

PMID: 38716406
PMCID: PMC11070732
DOI: 10.1080/01621459.2022.2115375

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya et al. J Am Stat Assoc. 2024.

. 2024;119(545):297-307.

doi: 10.1080/01621459.2022.2115375. Epub 2022 Oct 5.

Authors

Emre Demirkaya¹, Yingying Fan², Lan Gao^{1

2}, Jinchi Lv², Patrick Vossler², Jingbo Wang²

Affiliations

¹ University of Tennessee Knoxville.
² University of Southern California.

PMID: 38716406
PMCID: PMC11070732
DOI: 10.1080/01621459.2022.2115375

Abstract

The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth-order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing finite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.

Keywords: Bagging; Bootstrap and jackknife; Nonparametric estimation and inference; Two-scale distributional nearest neighbors; Weighted nearest neighbors; k-nearest neighbors.

PubMed Disclaimer

Figures

**Figure 1:**
The results of simulation setting 1 described in Section 5.1 for DNN and TDNN. The rows show the bias and MSE as functions of the subsampling scale $s$ for DNN and TDNN, respectively. The top right panel also depicts a zoomed-in plot where the U-shaped pattern is more apparent. The dashed lines in the MSE plots are labeled with the minimum MSE value for each of the methods. The tuned TDNN MSE minimum corresponds to the weighted LOOCV tuning method described at the beginning of Section 5.

See this image and copyright information in PMC

References

1. Arvesen JN (1969). Jackknifing U -statistics. Ann. Math. Statist 40, 2076–2100.
1. Athey S, Tibshirani J, Wager S, et al. (2019). Generalized random forests. Annals of Statistics 47(2), 1148–1178.
1. Berrett TB, Samworth RJ, and Yuan M (2019). Efficient multivariate entropy estimation via k-nearest neighbour distances. The Annals of Statistics 47, 288–318.
1. Berry AC (1941). The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Amer. Math. Soc 49, 122–136.
1. Biau G, Cérou F, and Guyader A (2010). On the rate of convergence of the bagged nearest neighbor estimate. Journal of Machine Learning Research 11, 687–712.

Grants and funding

R01 GM131407/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Affiliations

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources