Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 3;20(11):3166.
doi: 10.3390/s20113166.

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

Affiliations

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

Adeyinka Akanbi et al. Sensors (Basel). .

Abstract

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical 'batch' processing-extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.

Keywords: Apache Kafka; Internet of Things; big data; drought; middleware; stream processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of the distributed system.
Figure 2
Figure 2
A simple Apache Kafka ecosystem [16].
Figure 3
Figure 3
Infinite flow of streams [16].
Figure 4
Figure 4
The high-level view of ESTemd distributed middleware framework for real-time analysis of environmental management and monitoring data.
Figure 5
Figure 5
ESTemd Unified Middleware Framework Stack.
Figure 6
Figure 6
IoT devices and automated weather station.
Figure 7
Figure 7
Overview of the Apache Kafka streaming engine [16].
Figure 8
Figure 8
Node-Kafka-broker data pipeline programming flow [60].
Figure 9
Figure 9
Starting Confluent Platform in the Terminal.
Figure 10
Figure 10
The Confluent Enterprise Streaming Framework [16].
Figure 11
Figure 11
The Confluent platform interface.
Figure 12
Figure 12
Configuration of Kafka Connect APIs in Confluent platform.
Figure 13
Figure 13
Topics created in the Kafka broker through the Confluent platform.
Figure 14
Figure 14
Creating topic in the Kafka broker through the terminal.
Figure 15
Figure 15
KSQL cluster interfacing with the Kafka broker [16].
Figure 16
Figure 16
KSQL editor in the Confluent Platform.
Figure 17
Figure 17
The output stream from the EDI topic in the KSQL editor.
Figure 18
Figure 18
Performance evaluation simulation results of brokers in the cluster.

Similar articles

Cited by

References

    1. Hsu C.L., Lin J.C.C. An empirical examination of consumer adoption of Internet of Things services: Network externalities and concern for information privacy perspectives. Comput. Hum. Behav. 2016;62:516–527. doi: 10.1016/j.chb.2016.04.023. - DOI
    1. Kitchin R. The real-time city? Big data and smart urbanism. GeoJournal. 2014;79:1–14. doi: 10.1007/s10708-013-9516-8. - DOI
    1. Ed-daoudy A., Maalmi K. A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment. J. Big Data. 2019;6:104. doi: 10.1186/s40537-019-0271-7. - DOI
    1. Marcu O.C., Costan A., Antoniu G., Pérez-Hernández M., Tudoran R., Bortoli S., Nicolae B. Storage and Ingestion Systems in Support of Stream Processing: A Survey. HAL; Bengaluru, India: 2018.
    1. Carbone P., Katsifodimos A., Ewen S., Markl V., Haridi S., Tzoumas K. Apache flink: Stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 2015;36:4.