Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 6;118(14):e2020258118.
doi: 10.1073/pnas.2020258118.

Task-specific information outperforms surveillance-style big data in predictive analytics

Affiliations

Task-specific information outperforms surveillance-style big data in predictive analytics

Andreas Bjerre-Nielsen et al. Proc Natl Acad Sci U S A. .

Abstract

Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with "ground truth" administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

Keywords: academic performance; big data; prediction; privacy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Balanced accuracy of model out of sample on test data when using various feature sets. (A) Big data vs. administrative data, (B) task-related vs. general information, and (C) comparison of feature sets gathered over the lifespan of the student. Models are estimated using logistic regression with L2 regularization and using feature selection; see Materials and Methods for details. Each violin represents the distribution of weighted accuracy from 1,000 resamples. Inside the violins, the thick bar represents the bottom and top quartiles, and the thin lines represent the bottom and top deciles. The dashed, black line indicates the performance of a baseline random guessing model.

Similar articles

Cited by

References

    1. Greller W., Drachsler H., Translating learning into numbers: A generic framework for learning analytics. Educ. Technol. Soc. 15, 42–57 (2012).
    1. Slade S., Prinsloo P., Learning analytics: Ethical issues and dilemmas. Am. Behav. Sci. 57, 1510–1529 (2013).
    1. Warrell H., Students under surveillance. Financial Times, 24 July 2015. https://www.ft.com/content/634624c6-312b-11e5-91ac-a5e17d9b4cffpp. Accessed 6 June 2017.
    1. Harwell D., Colleges are turning students’ phones into surveillance machines, tracking the locations of hundreds of thousands. Washington Post, 24 December 2019. https://www.washingtonpost.com/technology/2019/12/24/colleges-are-turnin.... Accessed 17 January 2020.
    1. St. Amour M., Privacy and the online pivot. Inside Higher Education, 25 March 2020. https://www.insidehighered.com/news/2020/03/25/pivot-online-raises-conce.... Accessed 17 January 2020.

Publication types

LinkOut - more resources