THE DISTRIBUTION OF COOK'S D STATISTIC
- PMID: 24363487
- PMCID: PMC3867306
- DOI: 10.1080/03610927708831932
THE DISTRIBUTION OF COOK'S D STATISTIC
Abstract
Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook's statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations.
Keywords: influence; regression diagnostics; residual analysis.
References
-
- Atkinson AC. Plots, Transformations, and Regression. Clarendon Press; Oxford: 1985.
-
- Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data alld Sources of Collinearity. Wiley; New York: 1980.
-
- Chatterjee, Sand Hadi AS. Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science. 1986;1:379–416.
-
- Chen Mok M. Evaluating Cook’s D Statistic in Theory and Practice: A Simulation Study. Department of Biostatistics, University of North Carolina; Chapel Hill: 1993. Unpublished Master’s Paper.
-
- Cook RD. Detection of Influential Observations in Linear Regression. Technometrics. 1977;19:15–18.
Grants and funding
LinkOut - more resources
Full Text Sources