'Everything is data': towards one big data ecosystem using multiple sources of data on higher education in Indonesia

doi:10.1186/s40537-022-00639-7

. 2022;9(1):91.

doi: 10.1186/s40537-022-00639-7. Epub 2022 Jul 14.

'Everything is data': towards one big data ecosystem using multiple sources of data on higher education in Indonesia

Ariana Yunita^{1

2}, Harry B Santoso¹, Zainal A Hasibuan³

Affiliations

¹ Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424 Indonesia.
² Faculty of Science and Computer Science, Universitas Pertamina, South Jakarta, 12220 Indonesia.
³ Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Jawa Tengah 50131 Indonesia.

PMID: 35855913
PMCID: PMC9281197
DOI: 10.1186/s40537-022-00639-7

'Everything is data': towards one big data ecosystem using multiple sources of data on higher education in Indonesia

Ariana Yunita et al. J Big Data. 2022.

. 2022;9(1):91.

doi: 10.1186/s40537-022-00639-7. Epub 2022 Jul 14.

Authors

Ariana Yunita^{1

2}, Harry B Santoso¹, Zainal A Hasibuan³

Affiliations

¹ Faculty of Computer Science, Universitas Indonesia, Depok, Jawa Barat 16424 Indonesia.
² Faculty of Science and Computer Science, Universitas Pertamina, South Jakarta, 12220 Indonesia.
³ Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang, Jawa Tengah 50131 Indonesia.

PMID: 35855913
PMCID: PMC9281197
DOI: 10.1186/s40537-022-00639-7

Abstract

Big data is increasingly being promoted as a game changer for the future of science, as the volume of data has exploded in recent years. Big data characterized, among others, the data comes from multiple sources, multi-format, comply to 5-V's in nature (value, volume, velocity, variety, and veracity). Big data also constitutes structured data, semi-structured data, and unstructured-data. These characteristics of big data formed "big data ecosystem" that have various active nodes involved. Regardless such complex characteristics of big data, the studies show that there exists inherent structure that can be very useful to provide meaningful solutions for various problems. One of the problems is anticipating proper action to students' achievement. It is common practice that lecturer treat his/her class with "one-size-fits-all" policy and strategy. Whilst, the degree of students' understanding, due to several factors, may not the same. Furthermore, it is often too late to take action to rescue the student's achievement in trouble. This study attempted to gather all possible features involved from multiple data sources: national education databases, reports, webpages and so forth. The multiple data sources comprise data on undergraduate students from 13 provinces in Indonesia, including students' academic histories, demographic profiles and socioeconomic backgrounds and institutional information (i.e. level of accreditation, programmes of study, type of university, geographical location). Gathered data is furthermore preprocessed using various techniques to overcome missing value, data categorisation, data consistency, data quality assurance, to produce relatively clean and sound big dataset. Principal component analysis (PCA) is employed in order to reduce dimensions of big dataset and furthermore use K-Means methods to reveal clusters (inherent structure) that may occur in that big dataset. There are 7 clusters suggested by K-Means analysis: 1. very low-risk students, 2. low-risk students, 3. moderate-risk students, 4. fluctuating-risk students, 5. high risk students, 6. very high-risk students and, 7. fail students. Among the clusters unreveal, (1) a gap between public universities and private universities across the three regions in Indonesia, (2) a gap between STEM and non-STEM programmes of study, (3) a gap between rural versus urban, (4) a gap of accreditation status, (5) a gap of quality human resources distribution, etc. Further study, we will use the characteristics of each cluster to predict students' achievement based on students' profiles, and provide solutions and interventions strategies for students to improve their likely success.

Keywords: Big data; Data collection; Data preprocessing; Higher education; Indonesia.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare that they have no competing interests.

Figures

**Fig. 1**
An example of scree plot of PCA Eigenvalues

**Fig. 2**
General research framework (modified from [6])

**Fig. 3**
Infographic representing collection of data on higher education in Indonesia

**Fig. 6**
Two-dimensional (a) and three-dimensional (b) visualisations of the first two and three principal components

**Fig. 7**
Elbow plots used to analyse values of k

**Fig. 8**
Two-dimensional and three-dimensional visualisations with centroids of seven clusters (k-means)

**Fig. 10**
Bar plot analysis by cluster using a seven highest-variance PCs, b 15 highest-variance original features

**Fig. 11**
Labelling of clusters in dataset

See this image and copyright information in PMC

References

1. Rydning DR-JG-J, others. The digitization of the world from edge to core. Fram. Int. Data Corp. 2018 [cited 2021 Dec 25]. p. 16. https://www.seagate.com/files/www-content/our-story/trends/files/idc-sea...
1. Wu C, Buyya R, Ramamohanarao K. Big data analytics = machine learning + cloud computing. In: Buyya R, Calheiros RN, Dastjerdi AV, editors. Big Data Princ Paradig. Morgan Kaufmann; 2016. pp. 1–13.
1. Raut RD, Mangla SK, Narwane VS, Dora M, Liu M. Big Data Analytics as a mediator in Lean, Agile, Resilient, and Green (LARG) practices effects on sustainable supply chains. Transp Res Part E Logist Transp Rev. 2021;145:102170. doi: 10.1016/j.tre.2020.102170. - DOI
1. Anshari M, Almunawar MN, Lim SA, Al-Mudimigh A. Customer relationship management and big data enabled: Personalization & customization of services. Appl Comput Informatics. 2019;15:94–101. doi: 10.1016/j.aci.2018.05.004. - DOI
1. Aloqool A, Alharafsheh M, Abdellatif H, Alghasawneh LAS, Al-Gasawneh JA. The mediating role of customer relationship management between e-supply chain management and competitive advantage. Int J Data Netw Sci. 2022;6:263–272. doi: 10.5267/J.IJDNS.2021.9.002. - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

[1] Rydning DR-JG-J, others. The digitization of the world from edge to core. Fram. Int. Data Corp. 2018 [cited 2021 Dec 25]. p. 16. https://www.seagate.com/files/www-content/our-story/trends/files/idc-sea...

[2] Rydning DR-JG-J, others. The digitization of the world from edge to core. Fram. Int. Data Corp. 2018 [cited 2021 Dec 25]. p. 16. https://www.seagate.com/files/www-content/our-story/trends/files/idc-sea...

[3] Wu C, Buyya R, Ramamohanarao K. Big data analytics = machine learning + cloud computing. In: Buyya R, Calheiros RN, Dastjerdi AV, editors. Big Data Princ Paradig. Morgan Kaufmann; 2016. pp. 1–13.

[4] Wu C, Buyya R, Ramamohanarao K. Big data analytics = machine learning + cloud computing. In: Buyya R, Calheiros RN, Dastjerdi AV, editors. Big Data Princ Paradig. Morgan Kaufmann; 2016. pp. 1–13.

[5] Raut RD, Mangla SK, Narwane VS, Dora M, Liu M. Big Data Analytics as a mediator in Lean, Agile, Resilient, and Green (LARG) practices effects on sustainable supply chains. Transp Res Part E Logist Transp Rev. 2021;145:102170. doi: 10.1016/j.tre.2020.102170. - DOI

[6] Raut RD, Mangla SK, Narwane VS, Dora M, Liu M. Big Data Analytics as a mediator in Lean, Agile, Resilient, and Green (LARG) practices effects on sustainable supply chains. Transp Res Part E Logist Transp Rev. 2021;145:102170. doi: 10.1016/j.tre.2020.102170. - DOI

[7] Anshari M, Almunawar MN, Lim SA, Al-Mudimigh A. Customer relationship management and big data enabled: Personalization & customization of services. Appl Comput Informatics. 2019;15:94–101. doi: 10.1016/j.aci.2018.05.004. - DOI

[8] Anshari M, Almunawar MN, Lim SA, Al-Mudimigh A. Customer relationship management and big data enabled: Personalization & customization of services. Appl Comput Informatics. 2019;15:94–101. doi: 10.1016/j.aci.2018.05.004. - DOI

[9] Aloqool A, Alharafsheh M, Abdellatif H, Alghasawneh LAS, Al-Gasawneh JA. The mediating role of customer relationship management between e-supply chain management and competitive advantage. Int J Data Netw Sci. 2022;6:263–272. doi: 10.5267/J.IJDNS.2021.9.002. - DOI

[10] Aloqool A, Alharafsheh M, Abdellatif H, Alghasawneh LAS, Al-Gasawneh JA. The mediating role of customer relationship management between e-supply chain management and competitive advantage. Int J Data Netw Sci. 2022;6:263–272. doi: 10.5267/J.IJDNS.2021.9.002. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

'Everything is data': towards one big data ecosystem using multiple sources of data on higher education in Indonesia

Affiliations

'Everything is data': towards one big data ecosystem using multiple sources of data on higher education in Indonesia

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Miscellaneous