Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 16;21(22):7595.
doi: 10.3390/s21227595.

An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics

Affiliations

An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics

Chanyoung Choi et al. Sensors (Basel). .

Abstract

With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation-both of which are widely used-were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data.

Keywords: ensemble method; environmental sensor; machine learning; missing data; univariate and multivariate imputation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Location of the experiment sites in Soongsil University, divided into two groups: Group 1 and Group 2.
Figure 2
Figure 2
Experiment environment: (a) elevation view and (b) aerial view.
Figure 3
Figure 3
IoT environmental sensor device: (a) sensor device structure, (b) sensor device configuration, and (c) actual circuit diagram.
Figure 4
Figure 4
Flow chart of the process of imputing the missing value.
Figure 5
Figure 5
Missing values occur in real device using LoRa communication methods.
Figure 6
Figure 6
Missing values of CO2 with missing rate 10%: (a) communication error (aperiodic) and (b) communication error (periodic).
Figure 7
Figure 7
SVM 30 CO2 sensor data.
Figure 8
Figure 8
Missing values of CO2 with missing rate 10%: (a) Sensor Error (rapid change) and (b) Sensor Error (measurement range).
Figure 9
Figure 9
Autocorrelation coefficient for CO2.
Figure 10
Figure 10
Pearson Correlation between environmental substances.
Figure 11
Figure 11
Diagram of stacking method using univariate and multivariate imputations for base learner.
Figure 12
Figure 12
Comparison of MAE and RMSE values by imputation method for CO2 with missing rate 15%: (a) communication error (aperiodic), (b) communication error (periodic), (c) sensor error (rapid change), and (d) sensor error (measurement range).
Figure 12
Figure 12
Comparison of MAE and RMSE values by imputation method for CO2 with missing rate 15%: (a) communication error (aperiodic), (b) communication error (periodic), (c) sensor error (rapid change), and (d) sensor error (measurement range).
Figure 13
Figure 13
Comparison of error distribution values by imputation method for CO2 with missing rate 15%: (a) communication error (aperiodic), (b) communication error (periodic), (c) sensor error (rapid change), and (d) sensor error (measurement range).
Figure 14
Figure 14
Comparison of RMSE by imputation method for CO2 with missing rates 5, 10, 15, 20, 25 and 30%: (a) communication error (aperiodic), (b) communication error (periodic), (c) sensor error (rapid change), and (d) sensor error (measurement range).
Figure 15
Figure 15
Imputation of missing values in the existing graph, according to the imputation method for CO2 with missing rate 10%: (a) communication error (aperiodic), (b) communication error (periodic), (c) sensor error (rapid change), and (d) sensor error (measurement range).

References

    1. Metia S., Ha Q., Duc H., Scorgie Y. Urban air pollution estimation using unscented Kalman filtered inverse modeling with scaled monitoring data. Sustain. Cities Soc. 2020;54:101970. doi: 10.1016/j.scs.2019.101970. - DOI
    1. Cho J., Joo W. Data Clustering Method Using Efficient Fuzzifier Values Derivation. IEEE Access. 2020;8:124624–124632. doi: 10.1109/ACCESS.2020.3005666. - DOI
    1. Wang J., Dong K. What drives environmental degradation? Evidence from 14 Sub-Saharan African countries. Sci. Total Environ. 2019;656:165–173. doi: 10.1016/j.scitotenv.2018.11.354. - DOI - PubMed
    1. WHO. [(accessed on 16 August 2021)]. Available online: https://www.who.int/vietnam/news/feature-stories/detail/ten-threats-to-g....
    1. Xu X., Nie S., Ding H., Hou F.F. Environmental pollution and kidney diseases. Nat. Rev. Nephrol. 2018;14:313–324. doi: 10.1038/nrneph.2018.11. - DOI - PubMed

MeSH terms

Grants and funding