Predicting students' performance in English and Mathematics using data mining techniques
- PMID: 35919875
- PMCID: PMC9334550
- DOI: 10.1007/s10639-022-11259-2
Predicting students' performance in English and Mathematics using data mining techniques
Abstract
This study attempts to predict secondary school students' performance in English and Mathematics subjects using data mining (DM) techniques. It aims to provide insights into predictors of students' performance in English and Mathematics, characteristics of students with different levels of performance, the most effective DM technique for students' performance prediction, and the relationship between these two subjects. The study employed the archival data of students who were 16 years old in 2019 and sat for the Malaysian Certificate of Examination (MCE) in 2021. The learning of English and Mathematics is a concern in many countries. Three main factors, namely students' past academic performance, demographics, and psychological attributes were scrutinized to identify their impact on the prediction. This study utilized the Orange software for the DM process. It employed Decision Tree (DT) rules to determine the characteristics of students with low, moderate, and high performance in English and Mathematics subjects. DT and Naïve Bayes (NB) techniques show the best predictive performance for English and Mathematics subjects, respectively. Such characteristics and predictions may cue appropriate interventions to improve students' performance in these subjects. This study revealed students' past academic performance as the most critical predictor, as well as a few demographics and psychological attributes. By examining top predictors derived using four different classifier types, this study found that students' past Mathematics performance predicts their MCE English performance and students' past English performance predicts their MCE Mathematics performance. This finding shows students' performances in both subjects are interrelated.
Keywords: Data mining techniques; Educational data mining; English; Mathematics; Performance prediction; Secondary education.
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Conflict of interest statement
Conflict of interestThere is no potential conflict of interest in this study.
Figures













Similar articles
-
Predicting Master's students' academic performance: an empirical study in Germany.Smart Learn Environ. 2022;9(1):38. doi: 10.1186/s40561-022-00220-y. Epub 2022 Dec 23. Smart Learn Environ. 2022. PMID: 40477878 Free PMC article.
-
Predicting students' academic progress and related attributes in first-year medical students: an analysis with artificial neural networks and Naïve Bayes.BMC Med Educ. 2024 Jan 19;24(1):74. doi: 10.1186/s12909-023-04918-6. BMC Med Educ. 2024. PMID: 38243257 Free PMC article.
-
School motivation and high school dropout: the mediating role of educational expectation.Br J Educ Psychol. 2014 Mar;84(Pt 1):22-39. doi: 10.1111/bjep.12002. Epub 2012 Oct 24. Br J Educ Psychol. 2014. PMID: 24547752
-
Mathematics education and students with learning disabilities: introduction to the special series.J Learn Disabil. 1997 Jan-Feb;30(1):2-19, 68. doi: 10.1177/002221949703000101. J Learn Disabil. 1997. PMID: 9009879 Review.
-
Potential Future Directions in Optimization of Students' Performance Prediction System.Comput Intell Neurosci. 2022 May 17;2022:6864955. doi: 10.1155/2022/6864955. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 35619762 Free PMC article. Review.
Cited by
-
A multi-dimensional student performance prediction model (MSPP): An advanced framework for accurate academic classification and analysis.MethodsX. 2024 Dec 30;14:103148. doi: 10.1016/j.mex.2024.103148. eCollection 2025 Jun. MethodsX. 2024. PMID: 39866196 Free PMC article.
-
Regularized ensemble learning for prediction and risk factors assessment of students at risk in the post-COVID era.Sci Rep. 2024 Jul 13;14(1):16200. doi: 10.1038/s41598-024-66894-1. Sci Rep. 2024. PMID: 39003293 Free PMC article.
References
-
- Ahuja, R., Chug, A., Gupta, S., Ahuja, P., & Kohli, S. (2020). Classification and clustering algorithms of machine learning with their applications. In Nature-Inspired Computation in Data Mining and Machine Learning (pp. 225–248). Springer, Cham. 10.1007/978-3-030-28553-1_11
-
- Algarni, A. (2016). Data mining in education. International Journal of Advanced Computer Science and Applications, 7. 10.14569/IJACSA.2016.070659
-
- Almeda, M. V., Zuech, J., Utz, C., Higgins, G., Reynolds, R., & Baker, R. S. (2018). Comparing the factors that predict completion and grades among for-credit and open/mooc students in online learning. Online Learning Journal, 22(1), 1–18. 10.24059/olj.v22i1.1060
-
- Alyahyan E, Düştegör D. Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education. 2020;17(1):3. doi: 10.1186/s41239-020-0177-7. - DOI
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous