Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023;10(1):83.
doi: 10.1186/s40537-023-00760-1. Epub 2023 May 28.

Time series big data: a survey on data stream frameworks, analysis and algorithms

Affiliations

Time series big data: a survey on data stream frameworks, analysis and algorithms

Ana Almeida et al. J Big Data. 2023.

Abstract

Big data has a substantial role nowadays, and its importance has significantly increased over the last decade. Big data's biggest advantages are providing knowledge, supporting the decision-making process, and improving the use of resources, services, and infrastructures. The potential of big data increases when we apply it in real-time by providing real-time analysis, predictions, and forecasts, among many other applications. Our goal with this article is to provide a viewpoint on how to build a system capable of processing big data in real-time, performing analysis, and applying algorithms. A system should be designed to handle vast amounts of data and provide valuable knowledge through analysis and algorithms. This article explores the current approaches and how they can be used for the real-time operations and predictions.

Keywords: Anomaly detection; Big data; Forecasting; Machine learning; Stream processing engines; Time series.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Big data taxonomy—information collected from [2, 5, 15, 17, 19]
Fig. 2
Fig. 2
Google research trends over time—data collected from [81]
Fig. 3
Fig. 3
Big data applications components
Fig. 4
Fig. 4
Processing window mechanisms
Fig. 5
Fig. 5
Data processing frameworks: Popularity over the years first query
Fig. 6
Fig. 6
Data processing frameworks: Popularity over the years second query
Fig. 7
Fig. 7
Data processing frameworks: Popularity over the years third query
Fig. 8
Fig. 8
Forecast versus Stream
Fig. 9
Fig. 9
Anomaly detection versus Stream
Fig. 10
Fig. 10
Forecasting methods
Fig. 11
Fig. 11
Evolution of the popularity of type of methods regarding forecasting over the years. ML stands for Machine Learning, DL for Deep Learning, SL for Statistical Learning, and RL for Reinforcement Learning
Fig. 12
Fig. 12
Evolution of the popularity of methods regarding forecasting over the years. ANN stands for Artificial Neural Network, SVM for Support Vector Machine, LSTM for Long Short-Term Memory, A &S for ARIMA and SARIMA, RNN for Recurrent Neural Network, CNN for Convolution Neural Network, FNN for Feedforward Neural Network, AE for Autoencoder, GNN for Graph Neural Network, DBN for Deep Belief Network, LGBM for LightGBM, HA for Historical Average and RBM for Restricted Boltzmann Machines
Fig. 13
Fig. 13
Anomaly detection methods
Fig. 14
Fig. 14
Evolution of the popularity of type of methods regarding anomaly detection over the years. ML stands for Machine Learning, DL for Deep Learning, SL for Statistical Learning, and RL for Reinforcement Learning
Fig. 15
Fig. 15
Evolution of the popularity of methods regarding anomaly detection over the years. ESD stands for Extreme Studentized Deviate, PCA for Principal Component Analysis, rPCA for Robust Principal Component Analysis, MCD for Minimum Covariance Determinant, KNN for k-Nearest Neighbors, NB for Naive Bayes, SVM for Support Vector Machine, DT for Decision Trees (and includes random forest), ANN for Artificial Neural Network, FNN for Feedforward Neural Network, LSTM for Long Short-Term Memory, RNN for Recurrent Neural Network, CNN for Convolution Neural Network, SOM for Self-Organizing-Maps, RBM for Restricted Boltzmann Machines, AE for Autoencoder and DBSCAN for Density-Based Spatial Clustering of Applications with Noise

Similar articles

Cited by

References

    1. Cox M, Ellsworth D. Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization ’97. VIS ’97, pp. 235–244. IEEE Computer Society Press, Washington, DC, USA, 1997. 10.1109/VISUAL.1997.663888
    1. Fan J, Han F, Liu H. Challenges of Big Data analysis. Natl Sci Rev. 2014;1(2):293–314. doi: 10.1093/nsr/nwt032. - DOI - PMC - PubMed
    1. Gomes EHA, Plentz PDM, Rolt CRD, Dantas MAR. A survey on data stream, big data and real-time. Int J Netw Virtual Organ. 2019;20(2):143–167. doi: 10.1504/IJNVO.2019.097631. - DOI
    1. Zhou B, Li J, Wang X, Gu Y, Xu L, Hu Y, Zhu L. Online internet traffic monitoring system using spark streaming. Big Data Mining Anal. 2018;1(1):47–56. doi: 10.26599/BDMA.2018.9020005. - DOI
    1. Thudumu S, Branch P, Jin J, Singh J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J Big Data. 2020 doi: 10.1186/s40537-020-00320-x. - DOI

LinkOut - more resources