Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: a case study of Beijing, China
- PMID: 35150424
- DOI: 10.1007/s11356-022-18913-9
Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: a case study of Beijing, China
Abstract
Machine learning (ML) has shown high predictive ability in environmental research. Accurate estimation of daily PM2.5 concentrations is a prerequisite to address environmental public health issues. However, studies on the interpretability of ML algorithms were limited. In this study, we aimed to estimate the daily concentrations of PM2.5 at a seasonal level, and to understand the potential mechanisms of ML algorithms' decisions with SHapley Additive exPlanations (SHAP). Daily ground PM2.5 concentrations and meteorological data were obtained from the Beijing Municipal Ecological and Environmental Monitoring Center, and China Meteorological Data Service Centre between December 2013 and 2019 November. We calculated correlation coefficient and variance inflation factor (VIF) to eliminate the variables with collinearity, and recursive feature elimination (RFE) was further used to selected more important predictors. A series of ML algorithms, including linear regression, the variants of linear regression (Ridge, Lasso, Elasticnet), decision tree (DT), k-nearest neighbor (KNN), support vector regression (SVR), ensemble methods (random forest: RF, eXtreme Gradient Boosting: XGBoost), and deep learning (long short-term memory network: LSTM), were developed to estimate seasonal-level daily PM2.5 concentrations. A 10-fold cross validation was used to tune hyperparameters, and root mean square error (RMSE), mean absolute error (MAE), ratio of performance to deviation (RPD), and Lin's concordance correlation coefficient (LCCC) were used to evaluate models' performance. SHAP was performed for local and global interpretability analysis. The results showed that the distribution of PM2.5 concentrations in Beijing showed obvious seasonal patterns. A total of five variables (Precipitation, Mean wind speed, Sunshine duration, Mean surface temperature, Mean relative humidity) were selected for final prediction. LSTM showed much higher accuracy than other traditional ML models, achieved the smallest RMSE of 19.58 µg/m3 and MAE of 15.11 µg/m3. In terms of selected data set, there was acceptable (LCCC = 0.41 ~ 0.52) agreement and accuracy (RPD = 0.97 ~ 1.92) for LSTM. The SHAP analyses revealed that the meteorological factors had different influences in specific predictions, and the complex interactions were also illustrated. These results enhance our understanding of meteorological factors-PM2.5 relationships and explain the mechanisms of ML algorithms' decisions.
Keywords: China; Interpretability; Machine learning; Meteorological factors; PM2.5; Seasonal prediction.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Similar articles
-
Machine learning-based quantification and separation of emissions and meteorological effects on PM2.5 in Greater Bangkok.Sci Rep. 2025 Apr 28;15(1):14775. doi: 10.1038/s41598-025-99094-6. Sci Rep. 2025. PMID: 40295616 Free PMC article.
-
Estimating particulate matter concentrations and meteorological contributions in China during 2000-2020.Chemosphere. 2023 Jul;330:138742. doi: 10.1016/j.chemosphere.2023.138742. Epub 2023 Apr 19. Chemosphere. 2023. PMID: 37084902
-
A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information.Sci Total Environ. 2018 Sep 15;636:52-60. doi: 10.1016/j.scitotenv.2018.04.251. Epub 2018 Apr 25. Sci Total Environ. 2018. PMID: 29702402
-
Time-sensitive prediction of NO2 concentration in China using an ensemble machine learning model from multi-source data.J Environ Sci (China). 2024 Mar;137:30-40. doi: 10.1016/j.jes.2023.02.026. Epub 2023 Feb 26. J Environ Sci (China). 2024. PMID: 37980016 Review.
-
Deep-learning architecture for PM2.5 concentration prediction: A review.Environ Sci Ecotechnol. 2024 Feb 17;21:100400. doi: 10.1016/j.ese.2024.100400. eCollection 2024 Sep. Environ Sci Ecotechnol. 2024. PMID: 38439920 Free PMC article. Review.
Cited by
-
The role of booster vaccination in decreasing COVID-19 age-adjusted case fatality rate: Evidence from 32 countries.Front Public Health. 2023 Apr 18;11:1150095. doi: 10.3389/fpubh.2023.1150095. eCollection 2023. Front Public Health. 2023. PMID: 37143970 Free PMC article.
-
Predictive modelling of air pollution affecting human tuberculosis risk on Mainland China.Sci Rep. 2025 Jul 2;15(1):23633. doi: 10.1038/s41598-025-08078-z. Sci Rep. 2025. PMID: 40603496 Free PMC article.
-
Country-specific determinants for COVID-19 case fatality rate and response strategies from a global perspective: an interpretable machine learning framework.Popul Health Metr. 2024 Jun 3;22(1):10. doi: 10.1186/s12963-024-00330-4. Popul Health Metr. 2024. PMID: 38831424 Free PMC article.
References
-
- Atat R, Liu L, Wu J, Li G, Ye C, Yi Y (2018) Big data meet cyber-physical systems: a panoramic survey. IEEE Access 6:73603–73636 - DOI
-
- Bogo H, Otero M, Castro P, Ozafran MJ, Kreiner A, Calvo EJ, Negri RM (2003) Study of atmospheric particulate matter in Buenos Aires city. Atmos Environ 37:1135–1147 - DOI
-
- Burns P, Morris P (1994) Interpreting Financial Information. Business. Finance 4:47–64
-
- Cairong Lou, Hongyu Liu, Yufeng Li, Yan Peng, Juan Wang (2017) Relationships of relative humidity with PM2.5 and PM10 in the Yangtze River Delta, China. Environ Monit Assess 189:582 - DOI
-
- Carvalho DV, Pereira EM, Cardoso JS (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 8:832 - DOI
MeSH terms
Substances
LinkOut - more resources
Full Text Sources