Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024;119(545):297-307.
doi: 10.1080/01621459.2022.2115375. Epub 2022 Oct 5.

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Affiliations

Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors

Emre Demirkaya et al. J Am Stat Assoc. 2024.

Abstract

The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth-order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing finite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.

Keywords: Bagging; Bootstrap and jackknife; Nonparametric estimation and inference; Two-scale distributional nearest neighbors; Weighted nearest neighbors; k-nearest neighbors.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The results of simulation setting 1 described in Section 5.1 for DNN and TDNN. The rows show the bias and MSE as functions of the subsampling scale s for DNN and TDNN, respectively. The top right panel also depicts a zoomed-in plot where the U-shaped pattern is more apparent. The dashed lines in the MSE plots are labeled with the minimum MSE value for each of the methods. The tuned TDNN MSE minimum corresponds to the weighted LOOCV tuning method described at the beginning of Section 5.

References

    1. Arvesen JN (1969). Jackknifing U -statistics. Ann. Math. Statist 40, 2076–2100.
    1. Athey S, Tibshirani J, Wager S, et al. (2019). Generalized random forests. Annals of Statistics 47(2), 1148–1178.
    1. Berrett TB, Samworth RJ, and Yuan M (2019). Efficient multivariate entropy estimation via k-nearest neighbour distances. The Annals of Statistics 47, 288–318.
    1. Berry AC (1941). The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Amer. Math. Soc 49, 122–136.
    1. Biau G, Cérou F, and Guyader A (2010). On the rate of convergence of the bagged nearest neighbor estimate. Journal of Machine Learning Research 11, 687–712.

LinkOut - more resources