Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 7;59(39):21237-21247.
doi: 10.1021/acs.est.5c09687. Epub 2025 Sep 27.

Advancing Air Pollution Exposure Models with Open-Vocabulary Object Detection and Semantic Segmentation of Street-View Images

Affiliations

Advancing Air Pollution Exposure Models with Open-Vocabulary Object Detection and Semantic Segmentation of Street-View Images

Zhendong Yuan et al. Environ Sci Technol. .

Abstract

Mobile monitoring campaigns combined with land use regression (LUR) models effectively capture fine-scale spatial variations in urban air pollution. However, traditional predictor variables often fail to capture the nuances of the built environment and undocumented emission sources. To address this, we developed a framework integrating customizable object-level and segmentation-level visual features from street-view images into stepwise regression and random-forest-based LUR models. Using 5.7 million mobile air pollution measurements (2019-2020) and 0.37 million street-view images (2008-2024), we mapped nitrogen dioxide (NO2), black carbon (BC), and ultrafine particles (UFP) across 46,664 road segments in Amsterdam, The Netherlands. Incorporating street-view images improved model performance, increasing R2 by 0.01-0.05 and reducing mean absolute errors by 0.7-10.3%. Sensitivity analyses indicated that key street-view-derived visual features remained stable across years and seasons. Using images from nearby years expanded training instances, thereby enhancing alignment with mobile measurements at fine granularity. Our open-vocabulary object detection module identified influential but previously unrecognized object predictors, such as chimneys, traffic lights, and shops. Combined with segmentation-derived features (e.g., walls, roads, grass), street-view images contributed 8-18% feature importance to model predictions. These findings highlight the potential of visual data in enhancing hyperlocal air pollution mapping and exposure assessment.

Keywords: air pollution; deep learning; exposure assessment; land use regression (LUR); mobile sensing; street-view image; vision-language model (VLM); vision-transformer models (ViT).

PubMed Disclaimer

Figures

1
1
Examples of street-view images in different years and seasons. TikTok AI was used to convert street-view images into an animation style for demonstration purposes.
2
2
Architecture of the visual land use regression model (VLUR). Object and segmentation-level information is extracted from 0.37 million street-view images. These visual features are integrated with classic land use, traffic, and population data to train land use regression models, supervised by 50 m road-segment aggregated mobile air pollution measurements. TikTok AI was used to convert street-view images into an animation style for demonstration purposes.
3
3
Examples of open-vocabulary object and semantic segmentation results. Objects were detected by OWL-ViT (Vision Transformer for Open-World Localization). Semantic segmentation was performed using Mask2Former. TikTok AI was used to convert street-view images into an animation style for demonstration purposes.
4
4
Density plot of mobile training data, model predictions, and long-term validation data for NO2 at 33 Palmes locations. RF: random-forest-based LUR model. SLR: stepwise linear regression-based LUR model. SpecificY, MostnearY, and Season-weighted represent temporal strategies based on different sets of street-view images. Palmes 33 is a subset of long-term fixed-site measurements of NO2 in Amsterdam. The LUR models were trained by using mobile NO2 measurements and aimed to estimate long-term NO2 distributions represented by Palmes data.

References

    1. Hong K. Y., Pinheiro P. O., Weichenthal S.. Predicting Outdoor Ultrafine Particle Number Concentrations, Particle Size, and Noise Using Street-Level Images and Audio Data. Environ. Int. 2020;144:106044. doi: 10.1016/j.envint.2020.106044. - DOI - PubMed
    1. Huang J., Fei T., Kang Y., Li J., Liu Z., Wu G.. Estimating Urban Noise along Road Network from Street View Imagery. Int. J. Geogr. Inf. Sci. 2023;38(1):128–155. doi: 10.1080/13658816.2023.2274475. - DOI
    1. Song L., Liu D., Kwan M.-P., Liu Y., Zhang Y.. Machine-Based Understanding of Noise Perception in Urban Environments Using Mobility-Based Sensing Data. Comput. Environ. Urban Syst. 2024;114:102204. doi: 10.1016/j.compenvurbsys.2024.102204. - DOI
    1. Yang S., Chong A., Liu P., Biljecki F.. Thermal Comfort in Sight: Thermal Affordance and Its Visual Assessment for Sustainable Streetscape Design. Build. Environ. 2025;271:112569. doi: 10.1016/j.buildenv.2025.112569. - DOI
    1. Qi Q., Meng Q., Wang J., Ren P.. Developing an Optimized Method for the ‘Stop-and-Go’ Strategy in Mobile Measurements for Characterizing Outdoor Thermal Environments. Sustain. Cities Soc. 2021;69:102837. doi: 10.1016/j.scs.2021.102837. - DOI