Task-specific information outperforms surveillance-style big data in predictive analytics
- PMID: 33790010
- PMCID: PMC8040817
- DOI: 10.1073/pnas.2020258118
Task-specific information outperforms surveillance-style big data in predictive analytics
Abstract
Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with "ground truth" administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.
Keywords: academic performance; big data; prediction; privacy.
Copyright © 2021 the Author(s). Published by PNAS.
Conflict of interest statement
The authors declare no competing interest.
Figures

Similar articles
-
Association between medical students' prior experiences and perceptions of formal online education developed in response to COVID-19: a cross-sectional study in China.BMJ Open. 2020 Oct 29;10(10):e041886. doi: 10.1136/bmjopen-2020-041886. BMJ Open. 2020. PMID: 33122327 Free PMC article.
-
The Influence of Coronavirus Diseases 2019 (COVID-19) Pandemic and the Quarantine Practices on University Students' Beliefs About the Online Learning Experience in Jordan.Front Public Health. 2021 Jan 13;8:595874. doi: 10.3389/fpubh.2020.595874. eCollection 2020. Front Public Health. 2021. PMID: 33520916 Free PMC article.
-
Students' and lecturers' perspective on the implementation of online learning in dental education due to SARS-CoV-2 (COVID-19): a cross-sectional study.BMC Med Educ. 2020 Oct 9;20(1):354. doi: 10.1186/s12909-020-02266-3. BMC Med Educ. 2020. PMID: 33036592 Free PMC article.
-
Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work.Behav Res Methods. 2023 Sep;55(6):3026-3054. doi: 10.3758/s13428-022-01939-9. Epub 2022 Aug 26. Behav Res Methods. 2023. PMID: 36018483 Free PMC article. Review.
-
Machine learning and big data analytics in bipolar disorder: A position paper from the International Society for Bipolar Disorders Big Data Task Force.Bipolar Disord. 2019 Nov;21(7):582-594. doi: 10.1111/bdi.12828. Epub 2019 Sep 18. Bipolar Disord. 2019. PMID: 31465619 Review.
Cited by
-
Testing Thermostatic Bath End-Scale Stability for Calibration Performance with a Multiple-Sensor Ensemble Using ARIMA, Temporal Stochastics and a Quantum Walker Algorithm.Sensors (Basel). 2023 Feb 17;23(4):2267. doi: 10.3390/s23042267. Sensors (Basel). 2023. PMID: 36850864 Free PMC article.
-
Student and teacher performance during COVID-19 lockdown: An investigation of associated features and complex interactions using multiple data sources.PLoS One. 2023 Oct 25;18(10):e0291689. doi: 10.1371/journal.pone.0291689. eCollection 2023. PLoS One. 2023. PMID: 37878616 Free PMC article.
-
The origins of unpredictability in life outcome prediction tasks.Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2322973121. doi: 10.1073/pnas.2322973121. Epub 2024 Jun 4. Proc Natl Acad Sci U S A. 2024. PMID: 38833466 Free PMC article.
References
-
- Greller W., Drachsler H., Translating learning into numbers: A generic framework for learning analytics. Educ. Technol. Soc. 15, 42–57 (2012).
-
- Slade S., Prinsloo P., Learning analytics: Ethical issues and dilemmas. Am. Behav. Sci. 57, 1510–1529 (2013).
-
- Warrell H., Students under surveillance. Financial Times, 24 July 2015. https://www.ft.com/content/634624c6-312b-11e5-91ac-a5e17d9b4cffpp. Accessed 6 June 2017.
-
- Harwell D., Colleges are turning students’ phones into surveillance machines, tracking the locations of hundreds of thousands. Washington Post, 24 December 2019. https://www.washingtonpost.com/technology/2019/12/24/colleges-are-turnin.... Accessed 17 January 2020.
-
- St. Amour M., Privacy and the online pivot. Inside Higher Education, 25 March 2020. https://www.insidehighered.com/news/2020/03/25/pivot-online-raises-conce.... Accessed 17 January 2020.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical