Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 21;57(46):17959-17970.
doi: 10.1021/acs.est.2c07477. Epub 2023 Mar 18.

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Affiliations

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Riley E Mulhern et al. Environ Sci Technol. .

Abstract

Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' Fβ-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.

Keywords: children’s health; drinking water; lead; machine learning; risk assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Spatial distribution of child care centers sampled across NC through the Clean Water for Carolina Kids program and each facility’s maximum drinking water lead concentration.
Figure 2
Figure 2
AU-ROC and AU-PR scores for all eight models after 10-fold cross-validation and prediction of the test set. The box-and-whisker plots for the cross-validation performance measures represent the distribution of average scores from all 10 folds for each model; the gray shaded region shows the minimum to maximum range of performance across all models and all folds. The models predicting the maximum lead concentration in each facility performed slightly better than the models predicting the 90th percentile concentration.
Figure 3
Figure 3
Number and frequency of predictor variables included in the eight models tested. The Max ≥ 1 and P90 ≥ 15 model structures are shown as examples. Head Start status, the number of samples collected (a possible proxy for size of the center), the proportion of children with free/reduced lunch, and source water type were significant at every target concentration modeled. The structures for all eight models can be seen in Figure S7 in the Supporting Information, and interactive versions are available at www.cleanwaterforcarolinakids.org/publications/bn_models.
Figure 4
Figure 4
Distribution of average Fβ-scores after a 10-fold cross-validation of each alternative heuristic to predict each target compared to BN scores. Panel A shows scores with precision and sensitivity weighted equally (β = 1). Panel B shows scores when considering sensitivity to be twice as important as precision (β = 2). F-scores for BN models show the maximum achievable score from varying classification thresholds.
Figure 5
Figure 5
Graphical representation of the sensitivity improvement and sampling reduction achievable by the Max ≥ 1 ppb BN model compared to each of the heuristics. The plot can be read as the proportion of all high-risk facilities in an area (sensitivity, x-axis) that could be identified with a given number of facilities sampled (predicted positives, y-axis). As the desired sensitivity increases, the number of facilities that would have to be sampled also increases. Sensitivity improvement is represented by the distance that each point is offset to the left of the BN model performance curve. Sampling reduction is shown by the distance that each point is offset above the curve. The curve outperforms each of the points. Similar plots for the other eight models can be seen in Figure S17.
Figure 6
Figure 6
Summary of the range of sensitivity improvement (Panel A) and sampling reduction (Panel B) that could be achieved by the BN models compared to each of the heuristics.

References

    1. EPA; HHS . Joint Training: Implementing a 3Ts Program for Lead Testing in Drinking Water in Early Childhood Program Facilities; 2022.
    1. 117th Congress . Infrastructure Investment and Jobs Act. Section 50110: Lead Contamination in School Drinking Water; 135 Stat. 429, 2021. https://ballotpedia.org/Infrastructure_Investment_and_Jobs_Act_of_2021 and https://www.congress.gov/bill/117th-congress/house-bill/3684/text (accessed 2023-03-14).
    1. Pakenham C.; Olson B.. How States Are Handling Lead in School Drinking Water. Education Leaders Report; 2021. Vol. 7, ( (1), ), pp 1–16.
    1. USEPA . 40 CFR 141.92: Monitoring for lead in schools and child care facilities. https://www.ecfr.gov/current/title-40/chapter-I/subchapter-D/part-141/su... (accessed 2022-09-11).
    1. Boyd G. R.; Pierson G. L.; Kirmeyer G. J.; English R. J. Lead Variability Testing in Seattle Public Schools. J. Am. Water Works Assoc 2008, 100 (2), 53–64. 10.1002/j.1551-8833.2008.tb08142.x. - DOI