Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;2025(228):1-117.

Optimizing Air Pollution Exposure Assessment with Application to Cognitive Function

Affiliations

Optimizing Air Pollution Exposure Assessment with Application to Cognitive Function

L Sheppard et al. Res Rep Health Eff Inst. 2025 Aug.

Abstract

Introduction: Epidemiological studies often make use of exposure data that is collected in opportunistic and logistically convenient ways. And, while exposure assessment is fundamental to environmental epidemiology, little is known about what exposure assessment study designs are optimal for health inference. The objective of this project was to advance our understanding of the design of exposure assessment measurement campaigns and evaluate their impact on estimating the associations between long-term average air pollution exposure and cognitive function. This feeds into the broader goal of advancing understanding of air pollution exposure assessment design for application to epidemiological inference.

Methods: We leveraged data from the Adult Changes in Thought (ACT3) Air Pollution study (ACT-AP) to characterize exposures for over 5,000 participants from the ongoing ACT cohort. This is a population-based cohort of urban and suburban elderly individuals in the greater Puget Sound region drawn from Group Health Cooperative, now Kaiser Permanente, starting in 1994. Participants were routinely followed with routine biennial visits until dementia incidence, drop-out, or death. Extensive health, lifestyle, biological, and demographic data were also collected. The outcome measure used in this report is cognitive function at baseline based on the Cognitive Abilities Screening Instrument derived using Item Response Theory (CASI-IRT). The IRT transformation of the CASI score improves score accuracy, measures cognitive change with less bias, and accounts for missing test items. Health association analyses were based on 5,409 participants with both a valid CASI score and who had lived in the mobile monitoring region during at least 95% of the 5 years prior to baseline. We used 5-year average exposures that accounted for residential history.

Exposure data came from two distinct exposure assessment campaigns carried out by the ACT-AP study: a campaign using low-cost sensors (2017+) that supplemented existing regulatory monitoring data for fine particles (PM2.5, 1978+) and nitrogen dioxide (NO2, 1996+), and a year-long multipollutant mobile monitoring campaign (2019-2020). The evaluation of the added value of low-cost sensor data relied on a combination of regulatory monitoring data and other high-quality data from research studies, calibrated 2-week low-cost sensor measurements from over 100 locations, which were mostly ACT cohort residences, and a snapshot campaign that measured NO2 using Ogawa samplers. Predictions were at a 2-week average time scale, used a suite of ~200 geographic covariates, and were obtained from a spatiotemporal model developed at the University of Washington. The Seattle mobile monitoring campaign collected a combination of stationary roadside and on-road measurements of ultrafine particles (UFPs, four instruments), black carbon (BC), NO2, carbon dioxide (CO2), and PM2.5. Visits were temporally balanced over 288 drive days such that all sites were visited during all seasons, days of the week, and most hours of the day (5 a.m. to 11 p.m.) approximately 29 times each. For the on-road measurements, we divided the driving route into 100-meter segments and assigned all measurements to the segment midpoint. Predictions used the same suite of geographic covariates in a spatial model fit using partial least squares (PLS) dimension reduction with universal kriging (UK-PLS) to capture the remaining spatial structure. We reported model performance metrics for both the spatial and spatiotemporal models as root mean squared error (RMSE) and mean squared error (MSE)-based R2. The reference observations for the spatiotemporal model were low-cost sensor measurements at home locations (with performance metrics averaged over their entire measurement period to approximate spatial contrasts), and for the spatial model, the reference observations were the all data long-term averages at stationary roadside locations.

Using various approaches to sample data from these two exposure monitoring campaigns, we determined the impact on exposure prediction and estimates of health associations using two confounder models and 5-year average exposure predictions for cohort members at baseline developed from the alternative campaigns. For the low-cost sensor data, we evaluated temporally or spatially reduced subsets of low-cost sensors, as well as a comparison of the low-cost sensor versus snapshot campaigns for NO2. For the mobile monitoring data, we considered designs focused on the stationary roadside and on-road data separately. We reduced the stationary roadside data temporally by restricting seasons, times of day, or days of week for the campaign, while also considering a reduced number of visits using balanced sampling, as well as a set of unbalanced visit designs. We also reduced the on-road data spatially and temporally to assess the importance of spatially or temporally balanced data collection. In addition, we considered the impact of incorporating temporal adjustment to account for temporally unbalanced sampling, as well as plume adjustment to account for on-road sources. For each design, we evaluated prediction model performance using the all data stationary roadside observations (mobile campaign) or the measurements at homes (low-cost sensor campaign) as reference observations to ensure consistency in reported performance metrics. We also used long-term average exposures estimated from these alternative campaigns in health association analyses under two different confounder models that were adjusted by potentially confounding variables: Model 1 adjusted for age, calendar year, sex, and educational attainment; Model 2 included all Model 1 variables with the addition of race and socioeconomic status. Furthermore, using the stationary roadside data, we applied parametric and nonparametric bootstrap methods to account for Berkson-like and classical-like exposure measurement error for the UFP exposure in confounder model 1.

In a separate methods-focused aim, we developed and applied advanced statistical methods using the stationary roadside mobile monitoring data. To evaluate possible improvements in exposure model performance, we applied tree-based machine learning algorithms that also account for residual spatial structure, and compared these to UK-PLS. This led to the development of a variable importance metric that uses a leave-one-out approach to evaluate the change in predictions across various user-specified quantiles. The variable importance metric produces covariate-specific averages that reflect how the predictions, on average, vary across different quantiles of each covariate. This serves as an intuitive measure of the contribution of this covariate to the predicted outcome. A key idea in this variable importance approach is to reuse the trained mean model across all locations and to refit the covariance model in a leave-one-out manner. In separate work to address dimension reduction for multipollutant prediction, we extended classical principal component analysis (PCA) and a recently developed predictive PCA approach to optimize performance by balancing the representativeness in classical PCA with the predictive ability of predictive PCA. We called the new method representative and predictive PCA, or RapPCA.

Finally, we characterized the various exposure assessment campaigns in terms of the value of their information as quantified by cost. We calculated costs, focused predominantly on staff days of effort, for various exposure assessment designs and compared these to exposure model performance statistics.

Results: We found that air pollution exposure assessment design is critical for exposure prediction, and also impacts health inference. We showed that a mobile monitoring study with stationary roadside sampling that has at least 12 visits per location in a balanced and temporally unrestricted design optimizes exposure model performance while also limiting costs. Relative to weaker alternatives, a balanced and temporally unrestricted design has improved accuracy and reduced variability of health inferences, particularly for confounder model 1. To address temporal balance, it is important that the exposure sampling in mobile monitoring campaigns cover all days of the week, most hours of the day, and at least two seasons. The popular temporally restricted business-hours sampling design had the poorest performance, which was not improved by adjusting for the temporally unbalanced sampling approach. We found similar patterns using on-road data, though the findings were weaker overall.

For the alternative exposure campaign that supplemented regulatory monitoring data with low-cost sensor data, while the exposure prediction model performances improved with the inclusion of the low-cost sensors, there was little notable impact on the health inferences, and the costs were steep. Given that the supplementary exposure assessment data were sparse relative to the existing regulatory monitoring data, and that the low-cost sensor data collection used a rotating approach due to the limited number of sensors (i.e., low-cost sensor measurements were not collected using a balanced design), it was much more challenging to develop deep insights from this exposure assessment approach.

Finally, we found that leveraging spatial ensemble-learning methods for prediction did not improve exposure prediction model performances or alter health inferences. The new multipollutant dimension-reduction we developed, RapPCA, had the best predictive performance and also minimized the prediction error in comparison with both classical and predictive PCA.

Conclusions: This project has shown that there should be greater attention to the design of the exposure data collection campaigns used in epidemiological inference. Based on the multiple investigations conducted, many of which focused on UFPs, we found that exposure predictions with better performance statistics resulted in health association estimates that were generally more consistent with those obtained using the "best" exposure model predictions (the model with all data included), although the pattern of health estimates was often less conclusive than the pattern of prediction model performances. Furthermore, we found that it is possible to design air pollution exposure assessment studies that achieve good exposure prediction model performance while controlling their relative cost.

We developed strong recommendations for mobile monitoring campaign design, thanks to the well-designed and comprehensive Seattle mobile monitoring campaign. Insights from supplementing regulatory monitoring data with low-cost sensor data were less compelling, driven predominantly by a data structure with sparse and temporally unbalanced supplementary data that may not have been sufficiently comprehensive to demonstrate the impacts of alternative designs. Broadly speaking, better exposure assessment design leads to better exposure prediction model performance, which in turn can benefit estimates of health associations.

We did not find that leveraging advanced statistical methods (specifically, spatial ensemble-learning methods for prediction) improved exposure prediction model performances. This finding is not consistent with the conclusions reached by other investigators, and may have been due to the already sophisticated UK-PLS approach we used by default, and in particular its application in conjunction with the large number of covariates that we considered in the PLS model, such that the contribution of any single covariate was approximately linear. In other words, it is reasonable to believe that in the presence of the large set of covariates we considered, each can contribute an approximately linear association with the pollutant being modeled, such that the potential added value of the spatial Random Forest approach is not observed in the model fit. Other settings with a smaller number of possible covariates available may lead to different conclusions and suggest greater added value of the application of a spatial Random Forest approach.

We based our approach on leveraging the extensive air pollution exposure assessment and outcome data available from the ACT-AP study. Thus, we sampled from the existing air pollution data to evaluate exposure assessment designs that were subsets of those data. Then, conditional on each of these designs, we evaluated subsequent health inferences, which focused on cognitive function at baseline using the CASI-IRT outcome. The magnitude and uncertainty of these health association estimates were dependent upon the associations evident in the ACT cohort, and the insights we were able to develop are conditional on the strengths and weaknesses of these data. Specifically, while we observed some larger impacts on health association estimates of more poorly performing exposure models relative to the complete all data exposure model, such as the business-hours design from a mobile monitoring campaign, many of the differences were small and did not deviate meaningfully from the health association estimate obtained from the "best" exposure model. The degree of impact on the epidemiological inference depended on the magnitude of the health association estimate from the "best" exposure model and the width of its confidence interval. Future investigations should replicate and expand upon these findings in other settings, including application to new cohorts and exposure assessment data, as well as in simulation studies, which provide an alternative approach to using real-world data to evaluate a constellation of exposure models. However, while knowledge of the assumed underlying truth is an important strength of simulation studies, it is challenging to capture real-world complexity meaningfully in simulation studies.

Our foray into applying advanced machine-learning methods to improve exposure predictions produced the surprising result that our default UK-PLS approach for spatial prediction produced similar performance metrics to spatial ensemble-learning methods. Future evaluations that assess smaller subsets of exposure covariates will allow determination of the relative exposure model performance benefits of UK-PLS versus spatial ensemble-learning methods, and provide insights into the possible reason that our conclusions differ from others in the literature.

PubMed Disclaimer

Figures

Statement Figure.
Statement Figure.
Schematic overview of the study design.
Figure 3.1.
Figure 3.1.
Mobile monitoring routes from the Seattle mobile monitoring campaign. There were 309 stationary roadside sites along 9 routes and 10,330 unique jittered ACT cohort locations. The inset map shows the monitoring area within Washington (WA) state. Reprinted with permission from Blanco and colleagues (2022), Copyright 2022 ACS.
Figure 3.2.
Figure 3.2.
Low-cost sensor developed by the Seto lab and used in the low-cost sensor campaign.
Figure 3.3.
Figure 3.3.
The PM2.5 modeling study region with locations of PM2.5 sensors. The dashed box shows the prediction domain for model assessment in the PM2.5 analyses. This is the Puget Sound region of Washington state. The right inset map shows the greater Seattle area prediction domain. Reprinted with permission from Bi et al. . LCS = low-cost sensor.
Figure 4.1.
Figure 4.1.
Median normalized MSE-based R2 for fewer total stops designs using the stationary roadside data from the Seattle mobile monitoring campaign. MSE R2 is calculated by comparing cross-validated predictions to annual average estimate references from either the all data campaign (top) or the reduced sampling campaigns (bottom) and normalized (divided by) the R2 from the all data campaign. Normalized R2 values below one indicate worse performance than the all data campaign. Median performance parameters are each based on 30 campaigns. The dashed line is at .85. Reprinted with permission from Blanco et al. . Copyright 2023 American Chemical Society (ACS).
Figure 4.2.
Figure 4.2.
Comparison of the exposure surface from the all data campaign with the median prediction difference from some example reduced sampling campaigns. The all data campaign had over 7,000 stops (top) while the reduced sampling campaigns had ~3,000 stops using stationary roadside data from the Seattle mobile monitoring campaign. The ~3,000 temporally-balanced stops are from the fewer total stops design that randomly selects from the all data campaign with no time restrictions and can serve as a reference for the other reduced sampling campaigns. Reprinted with permission from Blanco et al. . Copyright 2023 American Chemical Society (ACS).
Figure 4.3.
Figure 4.3.
Summary of analytic approach for alternative exposure assessment designs derived from the Seattle mobile monitoring campaign stationary roadside data. See Table 4.1 for a summary of the specific sampling designs.
Figure 4.4.
Figure 4.4.
Cross-validated UFP model performances (N = 30 campaigns per design) using stationary roadside data from the Seattle mobile monitoring campaign. The dashed lines indicate the all data campaign performance. Red design reference box plots indicate the least restrictive or most balanced campaigns; any of these can serve as a reference for the business and rush hours designs. Business and rush hour designs produce annual average site estimates from unadjusted and temporally-adjusted visits. UFP models are for a total of 10–420 nm particles (pt/cm3) from the NanoScan instrument. The dashed line indicates the MSE R2 from the reference all data model, which is 0.65.
Figure 4.5.
Figure 4.5.
Estimated association between UFPs (per 1,900 pt/cm3) and cognitive function adjusted for age, calendar year, sex, and education (confounder model 1). Confounder model 2 was further adjusted for race and SES using stationary roadside data from the Seattle mobile monitoring campaign. The dashed green lines and shaded areas indicate the estimated point and 95% CIs from the all data exposure model, which are –0.020 (95% CI: –0.036, –0.004) in confounder model 1 and 0.002 (–0.016, 0.020) in confounder model 2. The dashed red line indicates no association. Box plots show the results when using exposure estimates from reduced mobile monitoring sampling campaigns (N = 30 estimates per box plot). Boxes show the median and IQR, whiskers show the 10th and 90th percentiles. Percentages on the y-axis show the estimated association relative to using the all data exposure model.
Figure 5.1.
Figure 5.1.
Sources and types of measurement error in air pollution epidemiology and its impacts on health inference bias and variability. ME = measurement error.
Figure 5.2.
Figure 5.2.
Overview of the estimation of health inference bias from classical-like measurement error and health inference variability from classical-like and Berkson-like measurement error. ME = measurement error; AP = air pollution.
Figure 5.3.
Figure 5.3.
Overview of the estimation of health inference bias from Berkson-like measurement error. ME = measurement error; AP = air pollution.
Figure 6.1.
Figure 6.1.
On-road mobile monitoring sampling designs. See Methods — Mobile Monitoring Sampling Designs for details. There are 30 campaigns for each sampling approach (N = 24 design paths × 30 campaigns = 720 total campaigns) in the main analyses.
Figure 6.2.
Figure 6.2.
Out-of-sample UFP (pt/cm3) exposure model performances for on-road campaigns (N = 30 campaigns per combination (i.e., box plot). RMSE is based on a comparison of the predicted PNC at 309 stationary locations and the annual average site estimate from stationary roadside measures. UFP models are for total particles (20–1,000 nm) from the unscreened P-Trak instrument and the Seattle mobile monitoring campaign. Boxes show the median and IQR, whiskers show the 10th and 90th percentiles. The dashed line indicates the R2 from the reference all data stationary model, which is 0.77.
Figure 6.3.
Figure 6.3.
Estimated association between UFPs (1,900 pt/cm3) and cognitive function at baseline using confounder models 1 and 2. UFP exposures are predicted from on-road monitoring campaigns using the 12-visit campaigns. The dashed green lines and shaded areas indicate the estimated point and 95% CIs from the all data roadside exposure model, which are –0.021 (95% CI: –0.039 to –0.003) in confounder model 1 and 0.007 (95% CI: –0.013 to 0.027) in confounder model 2. The dashed red line indicates no association. Boxes show the median and IQR, whiskers show the 10th and 90th percentiles.
Figure 7.1.
Figure 7.1.
Estimated association (95% CI) between PM2.5 (1 μm/m3) and cognitive function at baseline for different exposure models. The associations are adjusted for age, calendar year, sex, and education (confounder model 1). See Chapter 3 for the analysis approach and Table S7.4 for model descriptions. LCS = low-cost sensor; SP = spatial model; ST = spatiotemporal model.
Figure 7.2.
Figure 7.2.
Mean PM2.5 predictions for the all data model and the prediction differences for other monitoring designs. The data is from June 2017 to May 2019 (the period of the short-term low-cost sensor campaign). All data is the subtrahend for the other designs. Reprinted with permission from Bi et al. .
Figure 7.3.
Figure 7.3.
Estimated association (95% CI) between NO2 (3 ppb) and baseline cognitive function for exposure models with and without low-cost sensor data. The associations are adjusted for age, calendar year, sex, and education (confounder model 1). LCS = low-cost sensor; SP = spatial model; ST = spatiotemporal model.
Figure 8.1.
Figure 8.1.
Cross-validated prediction errors of UFPs for UK-PLS and SpatRF-PL at each monitoring location for the Seattle stationary roadside mobile monitoring data. The shade, color, and size of the dots reflect the magnitude of the errors.
Figure 8.2.
Figure 8.2.
Predicted UFP concentration surfaces based on predictions at gridded locations derived from the Seattle mobile monitoring stationary roadside data. Predicted surfaces use SpatRF-PL (map 1), UK-PLS (map 2), and their difference (map 3; UK-PLS is the subtrahend).
Figure 8.3.
Figure 8.3.
Comparison of 5-year average UFP exposures for 5,409 participants in the health analyses predicted from the primary UK-PLS model and alternative machine learning models. Exposure models were developed from unscreened P-Trak instrument readings (20–1,000 nm particles) from the Seattle mobile monitoring stationary roadside data. All alternative machine learning model predictions were highly correlated with the main reference model (UK-PLS), with Pearson correlations between 0.97–0.99.
Figure 8.4.
Figure 8.4.
Estimated association (95% CI) between UFPs (1,900 pt/cm3) and cognitive function at baseline using various machine learning exposure assessment models. The associations are adjusted for age, calendar year, sex, and education (confounder model 1).
Figure 8.5.
Figure 8.5.
Variable importance plot for UFP concentration predictions, showing the predictors among the top five contributors for either method for at least one contrast. All buffer sizes were included if one of them was within the top five important predictors. Analyses based on the Seattle mobile monitoring stationary roadside data.
Figure 8.6.
Figure 8.6.
Smoothed principal component scores for the first three principal components. Components were developed on the Seattle mobile monitoring roadside data, and scores were determined from classical PCA (PCA), predictive PCA (PredPCA), and representative and predictive PCA (RapPCA), respectively.
Figure 8.7.
Figure 8.7.
The first three PC loadings for each pollutant. These were developed on the Seattle mobile monitoring roadside data and obtained from classical PCA (PCA), predictive PCA (PredPCA), and representative and predictive PCA (RapPCA), respectively. There are three types of pollutants, where the suffix, if applicable, represents the properties of, or the instruments used to measure, each pollutant. In particular, the numeric suffix after ufp_ corresponds to the lowest value of the size range. Specifically, ufp_ptrak_36 represents P-Trak measurements with the diffusion screen (36–1000 nm) while ufp_ptrak_20 represents the difference between P-Trak measurements with and without the diffusion screen (20–36 nm). BC measurements at different wavelengths are labeled as: blue (bc_blue), green (bc_green), infrared (bc_ir), red (bc_red), and ultraviolet (bc_uv). The ultraviolet measurements were transformed to represent the difference (bc_uv_diff) between the ultraviolet and infrared ranges.
Figure 9.1.
Figure 9.1.
Relationships between the number of workdays and median CV MSE R2s across complete and alternative mobile monitoring designs according to the number of sites, visits, seasons, days, and hours. “All” and “All*” indicate all sites, seasons, days, or hours from the complete all data and reduced reference designs, respectively. Open circles and diamonds represent workdays and CV MSE R2s, respectively, from the complete reference design that serves as a reference for spatially reduced alternative designs with fewer sites or temporally reduced alternatives with fewer visits. ✡ and formula image display workdays and CV MSE R2s, respectively, from the reduced reference design that is equivalent to the complete all data design, except reduced to 12 visits instead of ~29, and is used as the “complete” design for temporally-restricted alternative designs with fewer seasons, days, or hours. The black dotted horizontal and vertical lines in the plot for the number of repeat visits per site highlight the CV MSE R2 for the reduced reference design. The designs in the left column are for spatially and temporally reduced alternatives for sites and visits, respectively, as opposed to the designs in the right column that show temporally restricted alternatives based on the number of seasons, or days/hours.
Commentary Figure 1.
Commentary Figure 1.
Schematic overview of the study design.
None
None
None

References

    1. Abdi H. 2010. Partial least squares regression and projection on latent structure regression PLS Regression. Wiley Interdiscip Rev Comput Stat 21:97–106, https://doi.org/10.1002/wics.51. - DOI
    1. Alexeeff SE, Roy A, Shan J, Liu X, Messier K, Apte JS, et al. 2018. High-resolution mapping of traffic-related air pollution with Google Street View cars and incidence of cardiovascular events within neighborhoods in Oakland, CA. Environ Health 1738:1–13, https://doi.org/10.1186/s12940-018-0382-1. - DOI - PMC - PubMed
    1. Allen R, Larson T, Sheppard L, Wallace L, Liu LJS. 2003. Use of real-time light scattering data to estimate the contribution of infiltrated and indoor-generated particles to indoor air. Environ Sci Technol 37:3484–3492, https://doi.org/10.1021/es021007e. - DOI - PubMed
    1. Apte JS, Messier KP, Gani S, Brauer M, Kirchstetter TW, Lunden MM, et al. 2017. High-resolution air pollution mapping with Google Street View cars: exploiting big data. Environ Sci Technol 5112:6999–7008, https://doi.org/10.1021/acs.est.7b00891. - DOI - PubMed
    1. Austin E, Xiang J, Gould TR, Shirai JH, Yun S, Yost MG, et al. 2021. Distinct ultrafine particle profiles associated with aircraft and roadway traffic. Environ Sci Technol 555:2847–2858, https://doi.org/10.1021/acs.est.0c05933. - DOI - PMC - PubMed

LinkOut - more resources