. 2023 Apr;7(4):484-501.

doi: 10.1038/s41562-022-01517-1. Epub 2023 Feb 9.

Insights into the accuracy of social scientists' forecasts of societal change

Forecasting Collaborative

Collaborators

Forecasting Collaborative:
Igor Grossmann, Amanda Rotella, Cendri A Hutcherson, Konstantyn Sharpinskyi, Michael E W Varnum, Sebastian Achter, Mandeep K Dhami, Xinqi Evie Guo, Mane Kara-Yakoubian, David R Mandel, Louis Raes, Louis Tay, Aymeric Vie, Lisa Wagner, Matus Adamkovic, Arash Arami, Patrícia Arriaga, Kasun Bandara, Gabriel Baník, František Bartoš, Ernest Baskin, Christoph Bergmeir, Michał Białek, Caroline K Børsting, Dillon T Browne, Eugene M Caruso, Rong Chen, Bin-Tzong Chie, William J Chopik, Robert N Collins, Chin Wen Cong, Lucian G Conway, Matthew Davis, Martin V Day, Nathan A Dhaliwal, Justin D Durham, Martyna Dziekan, Christian T Elbaek, Eric Shuman, Marharyta Fabrykant, Mustafa Firat, Geoffrey T Fong, Jeremy A Frimer, Jonathan M Gallegos, Simon B Goldberg, Anton Gollwitzer, Julia Goyal, Lorenz Graf-Vlachy, Scott D Gronlund, Sebastian Hafenbrädl, Andree Hartanto, Matthew J Hirshberg, Matthew J Hornsey, Piers D L Howe, Anoosha Izadi, Bastian Jaeger, Pavol Kačmár, Yeun Joon Kim, Ruslan Krenzler, Daniel G Lannin, Hung-Wen Lin, Nigel Mantou Lou, Verity Y Q Lua, Aaron W Lukaszewski, Albert L Ly, Christopher R Madan, Maximilian Maier, Nadyanna M Majeed, David S March, Abigail A Marsh, Michal Misiak, Kristian Ove R Myrseth, Jaime M Napan, Jonathan Nicholas, Konstantinos Nikolopoulos, Jiaqing O, Tobias Otterbring, Mariola Paruzel-Czachura, Shiva Pauer, John Protzko, Quentin Raffaelli, Ivan Ropovik, Robert M Ross, Yefim Roth, Espen Røysamb, Landon Schnabel, Astrid Schütz, Matthias Seifert, A T Sevincer, Garrick T Sherman, Otto Simonsson, Ming-Chien Sung, Chung-Ching Tai, Thomas Talhelm, Bethany A Teachman, Philip E Tetlock, Dimitrios Thomakos, Dwight C K Tse, Oliver J Twardus, Joshua M Tybur, Lyle Ungar, Daan Vandermeulen, Leighton Vaughan Williams, Hrag A Vosgerichian, Qi Wang, Ke Wang, Mark E Whiting, Conny E Wollbrant, Tao Yang, Kumar Yogeeswaran, Sangsuk Yoon, Ventura R Alves, Jessica R Andrews-Hanna, Paul A Bloom, Anthony Boyles, Loo Charis, Mingyeong Choi, Sean Darling-Hammond, Z E Ferguson, Cheryl R Kaiser, Simon T Karg, Alberto López Ortega, Lori Mahoney, Melvin S Marsh, Marcellin F R C Martinie, Eli K Michaels, Philip Millroth, Jeanean B Naqvi, Weiting Ng, Robb B Rutledge, Peter Slattery, Adam H Smiley, Oliver Strijbis, Daniel Sznycer, Eli Tsukayama, Austin van Loon, Jan G Voelkel, Margaux N A Wienk, Tom Wilkening

PMID: 36759585
PMCID: PMC10192018
DOI: 10.1038/s41562-022-01517-1

Insights into the accuracy of social scientists' forecasts of societal change

Forecasting Collaborative. Nat Hum Behav. 2023 Apr.

. 2023 Apr;7(4):484-501.

doi: 10.1038/s41562-022-01517-1. Epub 2023 Feb 9.

Author

Forecasting Collaborative

Collaborators

Forecasting Collaborative:
Igor Grossmann, Amanda Rotella, Cendri A Hutcherson, Konstantyn Sharpinskyi, Michael E W Varnum, Sebastian Achter, Mandeep K Dhami, Xinqi Evie Guo, Mane Kara-Yakoubian, David R Mandel, Louis Raes, Louis Tay, Aymeric Vie, Lisa Wagner, Matus Adamkovic, Arash Arami, Patrícia Arriaga, Kasun Bandara, Gabriel Baník, František Bartoš, Ernest Baskin, Christoph Bergmeir, Michał Białek, Caroline K Børsting, Dillon T Browne, Eugene M Caruso, Rong Chen, Bin-Tzong Chie, William J Chopik, Robert N Collins, Chin Wen Cong, Lucian G Conway, Matthew Davis, Martin V Day, Nathan A Dhaliwal, Justin D Durham, Martyna Dziekan, Christian T Elbaek, Eric Shuman, Marharyta Fabrykant, Mustafa Firat, Geoffrey T Fong, Jeremy A Frimer, Jonathan M Gallegos, Simon B Goldberg, Anton Gollwitzer, Julia Goyal, Lorenz Graf-Vlachy, Scott D Gronlund, Sebastian Hafenbrädl, Andree Hartanto, Matthew J Hirshberg, Matthew J Hornsey, Piers D L Howe, Anoosha Izadi, Bastian Jaeger, Pavol Kačmár, Yeun Joon Kim, Ruslan Krenzler, Daniel G Lannin, Hung-Wen Lin, Nigel Mantou Lou, Verity Y Q Lua, Aaron W Lukaszewski, Albert L Ly, Christopher R Madan, Maximilian Maier, Nadyanna M Majeed, David S March, Abigail A Marsh, Michal Misiak, Kristian Ove R Myrseth, Jaime M Napan, Jonathan Nicholas, Konstantinos Nikolopoulos, Jiaqing O, Tobias Otterbring, Mariola Paruzel-Czachura, Shiva Pauer, John Protzko, Quentin Raffaelli, Ivan Ropovik, Robert M Ross, Yefim Roth, Espen Røysamb, Landon Schnabel, Astrid Schütz, Matthias Seifert, A T Sevincer, Garrick T Sherman, Otto Simonsson, Ming-Chien Sung, Chung-Ching Tai, Thomas Talhelm, Bethany A Teachman, Philip E Tetlock, Dimitrios Thomakos, Dwight C K Tse, Oliver J Twardus, Joshua M Tybur, Lyle Ungar, Daan Vandermeulen, Leighton Vaughan Williams, Hrag A Vosgerichian, Qi Wang, Ke Wang, Mark E Whiting, Conny E Wollbrant, Tao Yang, Kumar Yogeeswaran, Sangsuk Yoon, Ventura R Alves, Jessica R Andrews-Hanna, Paul A Bloom, Anthony Boyles, Loo Charis, Mingyeong Choi, Sean Darling-Hammond, Z E Ferguson, Cheryl R Kaiser, Simon T Karg, Alberto López Ortega, Lori Mahoney, Melvin S Marsh, Marcellin F R C Martinie, Eli K Michaels, Philip Millroth, Jeanean B Naqvi, Weiting Ng, Robb B Rutledge, Peter Slattery, Adam H Smiley, Oliver Strijbis, Daniel Sznycer, Eli Tsukayama, Austin van Loon, Jan G Voelkel, Margaux N A Wienk, Tom Wilkening

PMID: 36759585
PMCID: PMC10192018
DOI: 10.1038/s41562-022-01517-1

Abstract

How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing the accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on social media, and gender-career and racial bias. After we provided them with historical trend data on the relevant domain, social scientists submitted pre-registered monthly forecasts for a year (Tournament 1; N = 86 teams and 359 forecasts), with an opportunity to update forecasts on the basis of new data six months later (Tournament 2; N = 120 teams and 546 forecasts). Benchmarking forecasting accuracy revealed that social scientists' forecasts were on average no more accurate than those of simple statistical models (historical means, random walks or linear regressions) or the aggregate forecasts of a sample from the general public (N = 802). However, scientists were more accurate if they had scientific expertise in a prediction domain, were interdisciplinary, used simpler models and based predictions on prior data.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

**Fig. 1 |. Social scientists’ average forecasting errors, compared against different benchmarks.**
We ranked the domains from least to most error in Tournament 1, assessing forecasting errors via the MASE. The estimated means for the scientists and the naive crowd indicate the fixed-effect coefficients of a linear mixed model with domain (k = 12) and group (in Tournament 1: N_scientists = 86, N_{naive crowd} = 802; only scientists in Tournament 2: N = 120) as predictors of forecasting error (MASE) scores nested in teams (Tournament 1 observations: N_scientists = 359, N_{naive crowd} = 1,467; Tournament 2 observations: N = 546), using restricted maximum likelihood estimation. To correct for right skew, we used log-transformed MASE scores, which were subsequently back-transformed when calculating estimated means and 95% CIs. In each tournament, the CIs were adjusted for simultaneous inference of estimates for 12 domains in each tournament by simulating a multivariate t distribution. The benchmarks represent the naive crowd and the best-performing naive statistical benchmark (historical mean, average random walk with an autoregressive lag of one or linear regression). Statistical benchmarks were obtained via simulations (k = 10,000) with resampling (Supplementary Information). Scores to the left of the dotted vertical line show better performance than a naive in-sample random walk. Scores to the left of the dashed vertical line show better performance than the median performance in M4 tournaments.

**Fig. 2 |. Forecasts and ground truth—are forecasts anchoring on the last few historical data points?**
Historical time series (40 months before Tournament 1) and ground truth series (12 months over Tournament 1), along with forecasts of individual teams (light blue), lowess curves and 95% CIs across social scientists’ forecasts (blue), and lowess curves and 95% CIs across the naive crowd’s forecasts (salmon). For most domains, Tournament 1 forecasts of both scientists and the naive crowd start near the last few historical data points they received prior to the tournament (January–March 2020). Note that the April 2020 forecast was not individual teams (light blue), lowess curves and 95% CIs across social scientists’ provided to the participants. IAT, implicit association test.

**Fig. 3 |. Ratios of forecasting errors among benchmarks compared to scientific forecasts.**
Scores greater than 1 indicate greater accuracy of scientific forecasts. Scores less than 1 indicate greater accuracy of naive benchmarks. The domains are ranked from least to most error among scientific teams in Tournament 1. The estimated means indicate the fixed-effect coefficients of linear mixed models with domain (k = 12) in each tournament (N_{Tournament 1} = 86; N_{Tournament 2} = 120) as a predictor of benchmark-specific ratio scores nested in teams (observations: N_{Tournament 1} = 359, N_{Tournament 2} = 546), using restricted maximum likelihood estimation. To correct for right skew, we used square-root or log-transformed MASE scores, which were subsequently back-transformed when calculating estimated means and 95% CIs. The CIs were adjusted for simultaneous inference of estimates for 12 domains in each tournament by simulating a multivariate t distribution.

**Fig. 4 |. Cross-tournament consistency in the ranking of domains in terms of forecasting inaccuracy.**
Cross-tournament consistency in the ranking of domains in terms of forecasting inaccuracy. Left part of the graph shows ranking of domains in terms of the estimated mean forecasting error, assessed via MASE, across all teams in the first tournament (May 2020) from most to least inaccurate. Right part of the graph shows corresponding ranking of domains for the second tournament (November 2020). A solid line of the slope graph indicates that the change in accuracy between tournaments is statistically significant (P < 0.05); a dashed line indicates a non-significant change. Significance was determined via pairwise comparisons of log(MASE) scores for each domain, drawing on the restricted information maximum likelihood model with tournament (first or second), domain and their interaction as predictors of the log(MASE) scores, with responses nested in scientific teams (N_teams = 120, N_observations = 905).

**Fig. 5 |. Forecasting errors by prediction approach.**
The estimated means and 95% CIs are based on a restricted information maximum likelihood linear mixed-effects model with model type (data-driven, hybrid or intuition/theory-based) as a fixed-effects predictor of the log(MASE) scores, domain as a fixed-effects covariate and responses nested in participants. We ran separate models for each tournament (first: N_groups = 86, N_observations = 359; second: N_groups = 120, N_observations = 546). Scores below the dotted horizontal line show better performance than a naive in-sample random walk. Scores below the dashed horizontal line show better performance than the median performance in M4 tournaments.

**Fig. 6 |. Contributions of specific forecasting strategies and team characteristics to forecasting accuracy.**
Contributions of specific forecasting strategies (n parameters, statistical model complexity, consideration of exogenous events and counterfactuals) and team characteristics to forecasting accuracy (reversed MASE scores), ranked in terms of magnitude. Scores to the right of the dashed vertical line contribute positively to accuracy, whereas estimates to the left of the dashed vertical line contribute negatively. The analyses control for domain type. All continuous predictors are mean-centred and scaled by two standard deviations, to afford comparability. The reported standard errors are heteroskedasticity robust. The thicker bands show the 90% CIs, and the thinner lines show the 95% CIs. The effects are statistically significant if the 95% CI does not include zero (dashed vertical line).

See this image and copyright information in PMC

Comment in

Predicting the future of society.
Salganik MJ. Salganik MJ. Nat Hum Behav. 2023 Apr;7(4):478-479. doi: 10.1038/s41562-023-01535-7. Nat Hum Behav. 2023. PMID: 36759587 No abstract available.

References

1. Hutcherson C et al. On the accuracy, media representation, and public perception of psychological scientists’ judgments of societal change. Preprint at 10.31234/osf.io/g8f9s (2023). - DOI - PubMed
1. Collins H & Evans R Rethinking Expertise (Univ. of Chicago Press, 2009).
1. Fama EF Efficient capital markets: a review of theory and empirical work. J. Finance 25, 383–417 (1970).
1. Tetlock PE Expert Political Judgement: How Good Is It? (Princeton University Press, 2017).
1. Hofman JM et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021). - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

K23 AT010879/AT/NCCIH NIH HHS/United States

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Insights into the accuracy of social scientists' forecasts of societal change

Collaborators

Insights into the accuracy of social scientists' forecasts of societal change

Author

Collaborators

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous