. 2022 Jun 23;22(13):4756.

doi: 10.3390/s22134756.

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Kassiano J Matteussi^{1

2}, Julio C S Dos Anjos³, Valderi R Q Leithardt^{4

5}, Claudio F R Geyer¹

Affiliations

¹ Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil.
² LIG-ERODS, Université Grenoble Alpes, 38058 Grenoble, France.
³ Graduate Program in Teleinformatics Engineering Federal, University of Ceará, PPGETI/UFC, Center of Technology, Campus of Pici, Fortaleza 60455-970, CE, Brazil.
⁴ COPELABS, Universidade Lusófona de Humanidades e Tecnologias, 1749-024 Lisboa, Portugal.
⁵ VALORIZA, Research Center for Endogenous Resource Valorization, Polytechnic Institute of Portalegre, 7300-555 Portalegre, Portugal.

PMID: 35808249
PMCID: PMC9269592
DOI: 10.3390/s22134756

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Kassiano J Matteussi et al. Sensors (Basel). 2022.

. 2022 Jun 23;22(13):4756.

doi: 10.3390/s22134756.

Authors

Kassiano J Matteussi^{1

2}, Julio C S Dos Anjos³, Valderi R Q Leithardt^{4

5}, Claudio F R Geyer¹

Affiliations

¹ Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil.
² LIG-ERODS, Université Grenoble Alpes, 38058 Grenoble, France.
³ Graduate Program in Teleinformatics Engineering Federal, University of Ceará, PPGETI/UFC, Center of Technology, Campus of Pici, Fortaleza 60455-970, CE, Brazil.
⁴ COPELABS, Universidade Lusófona de Humanidades e Tecnologias, 1749-024 Lisboa, Portugal.
⁵ VALORIZA, Research Center for Endogenous Resource Valorization, Polytechnic Institute of Portalegre, 7300-555 Portalegre, Portugal.

PMID: 35808249
PMCID: PMC9269592
DOI: 10.3390/s22134756

Abstract

A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.

Keywords: backpressure; big data; spark streaming; stream processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
PID Controller Model Implementation.

**Figure 2**
Spark Backpressure PID Architecture.

**Figure 4**
Memory Management Behaviour.

**Figure 5**
Stateless SumServer Application-Pipeline 2-Parasilo Cluster.

**Figure 6**
Stateful SumServer Application Without Backpressure—Pipeline 1.

**Figure 7**
Stateful SumServer Application Without Backpressure—Pipeline 2.

**Figure 8**
Backpressure Initial Rate Feature Comparison for Stateful SUMServer Application—Pipeline 2.

**Figure 9**
Stateful SumServer Application With Backpressure—Pipeline 2.

See this image and copyright information in PMC

References

1. Hassanien A.E., Darwish A. Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges. Volume 77 Springer Nature; Berlin/Heidelberg, Germany: 2020.
1. Avgeris M., Spatharakis D., Dechouniotis D., Leivadeas A., Karyotis V., Papavassiliou S. ENERDGE: Distributed Energy-Aware Resource Allocation at the Edge. Sensors. 2022;22:660. doi: 10.3390/s22020660. - DOI - PMC - PubMed
1. Tang Z., Zeng A., Zhang X., Yang L., Li K. Dynamic Memory-Aware Scheduling in Spark Computing Environmen. J. Parallel Distrib. Comput. 2020;141:10–22. doi: 10.1016/j.jpdc.2020.03.010. - DOI
1. da Silva Veith A., Dias de Assuncao M., Lefevre L. Latency-Aware Strategies for Deploying Data Stream Processing Applications on Large Cloud-Edge Infrastructure. IEEE Trans. Cloud Comput. 2021:11236. doi: 10.1109/TCC.2021.3097879. - DOI
1. Toshniwal A., Taneja S., Shukla A., Ramasamy K., Patel J.M., Kulkarni S., Jackson J., Gade K., Fu M., Donham J., et al. Storm@twitter; Proceedings of the ACM SIGMOD International Conference on Management of Data; Snowbird, UT, USA. 22–27 June 2014; pp. 147–156. - DOI

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Affiliations

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources