What is a Good Calibration Question?
- PMID: 33864272
- DOI: 10.1111/risa.13725
What is a Good Calibration Question?
Abstract
Weighted aggregation of expert judgments based on their performance on calibration questions may improve mathematically aggregated judgments relative to equal weights. However, obtaining validated, relevant calibration questions can be difficult. If so, should analysts settle for equal weights? Or should they use calibration questions that are easier to obtain but less relevant? In this article, we examine what happens to the out-of-sample performance of weighted aggregations of the classical model (CM) compared to equal weighted aggregations when the set of calibration questions includes many so-called "irrelevant" questions, those that might ordinarily be considered to be outside the domain of the questions of interest. We find that performance weighted aggregations outperform equal weights on the combined CM score, but not on statistical accuracy (i.e., calibration). Importantly, there was no appreciable difference in performance when weights were developed on relevant versus irrelevant questions. Experts were unable to adapt their knowledge across vastly different domains, and in-sample validation did not accurately predict out-of-sample performance on irrelevant questions. We suggest that if relevant calibration questions cannot be found, then analysts should use equal weights, and draw on alternative techniques to improve judgments. Our study also indicates limits to the predictive accuracy of performance weighted aggregation, and the degree to which expertise can be adapted across domains. We note limitations in our study and urge further research into the effect of question type on the reliability of performance weighted aggregations.
Keywords: Aggregation; calibration; equal weights; expert judgment; performance weights.
© 2021 Society for Risk Analysis.
References
-
- Aspinall, W. P. (2010). A route to more tractable expert advice. Nature, 463(7279), 294-295.
-
- Bamber, J., Aspinall, W., & Cooke, R. (2016). A commentary on “how to interpret expert judgment assessments of twenty-first century sea-level rise” by Hylke de Vries and Roderik SW van de Wal [journal article]. Climatic Change, 137(3), 321-328. https://doi.org/10.1007/s10584-016-1672-7
-
- Bedford, T., & Cooke, R. M. (2001). Mathematical tools for probabilistic risk analysis. Cambridge, UK: Cambridge University Press.
-
- Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom of crowds. Management Science, 61(2), 267-280.
-
- Burgman, M., Carr, A., Godden, L., Gregory, R., McBride, M., Flander, L., & Maguire, L. (2011). Redefining expertise and improving ecological judgment. Conservation Letters, 4(2), 81-87. https://doi.org/10.1111/j.1755-263X.2011.00165.x
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
