A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
- PMID: 32503145
- PMCID: PMC7308861
- DOI: 10.3390/s20113166
A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
Abstract
In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical 'batch' processing-extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.
Keywords: Apache Kafka; Internet of Things; big data; drought; middleware; stream processing.
Conflict of interest statement
The authors declare no conflict of interest.
Figures


















Similar articles
-
Performance Analysis of IoT-Based Sensor, Big Data Processing, and Machine Learning Model for Real-Time Monitoring System in Automotive Manufacturing.Sensors (Basel). 2018 Sep 4;18(9):2946. doi: 10.3390/s18092946. Sensors (Basel). 2018. PMID: 30181525 Free PMC article.
-
KS-DDoS: Kafka streams-based classification approach for DDoS attacks.J Supercomput. 2022;78(6):8946-8976. doi: 10.1007/s11227-021-04241-1. Epub 2022 Jan 16. J Supercomput. 2022. PMID: 35068686 Free PMC article.
-
A new Apache Spark-based framework for big data streaming forecasting in IoT networks.J Supercomput. 2023;79(10):11078-11100. doi: 10.1007/s11227-023-05100-x. Epub 2023 Feb 21. J Supercomput. 2023. PMID: 36845222 Free PMC article.
-
A stream processing abstraction framework.Front Big Data. 2023 Oct 25;6:1227156. doi: 10.3389/fdata.2023.1227156. eCollection 2023. Front Big Data. 2023. PMID: 37953916 Free PMC article. Review.
-
Framing Apache Spark in life sciences.Heliyon. 2023 Feb 9;9(2):e13368. doi: 10.1016/j.heliyon.2023.e13368. eCollection 2023 Feb. Heliyon. 2023. PMID: 36852030 Free PMC article. Review.
Cited by
-
DDR-coin: An Efficient Probabilistic Distributed Trigger Counting Algorithm.Sensors (Basel). 2020 Nov 11;20(22):6446. doi: 10.3390/s20226446. Sensors (Basel). 2020. PMID: 33187349 Free PMC article.
-
Smart aquaculture analytics: Enhancing shrimp farming in Bangladesh through real-time IoT monitoring and predictive machine learning analysis.Heliyon. 2024 Sep 2;10(17):e37330. doi: 10.1016/j.heliyon.2024.e37330. eCollection 2024 Sep 15. Heliyon. 2024. PMID: 39296145 Free PMC article.
-
ISOBlue HD: An Open-Source Platform for Collecting Context-Rich Agricultural Machinery Datasets.Sensors (Basel). 2020 Oct 12;20(20):5768. doi: 10.3390/s20205768. Sensors (Basel). 2020. PMID: 33053819 Free PMC article.
-
Time Series Forecasting of Univariate Agrometeorological Data: A Comparative Performance Evaluation via One-Step and Multi-Step Ahead Forecasting Strategies.Sensors (Basel). 2021 Apr 1;21(7):2430. doi: 10.3390/s21072430. Sensors (Basel). 2021. PMID: 33916026 Free PMC article.
References
-
- Hsu C.L., Lin J.C.C. An empirical examination of consumer adoption of Internet of Things services: Network externalities and concern for information privacy perspectives. Comput. Hum. Behav. 2016;62:516–527. doi: 10.1016/j.chb.2016.04.023. - DOI
-
- Kitchin R. The real-time city? Big data and smart urbanism. GeoJournal. 2014;79:1–14. doi: 10.1007/s10708-013-9516-8. - DOI
-
- Ed-daoudy A., Maalmi K. A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment. J. Big Data. 2019;6:104. doi: 10.1186/s40537-019-0271-7. - DOI
-
- Marcu O.C., Costan A., Antoniu G., Pérez-Hernández M., Tudoran R., Bortoli S., Nicolae B. Storage and Ingestion Systems in Support of Stream Processing: A Survey. HAL; Bengaluru, India: 2018.
-
- Carbone P., Katsifodimos A., Ewen S., Markl V., Haridi S., Tzoumas K. Apache flink: Stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 2015;36:4.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials