. 2023 Nov 21;57(46):17959-17970.

doi: 10.1021/acs.est.2c07477. Epub 2023 Mar 18.

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Affiliations

¹ RTI International, Research Triangle Park, North Carolina 27709, United States.
² Environmental Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States.
³ Department of Civil, Construction, and Environmental Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States.

PMID: 36932953
PMCID: PMC10666530
DOI: 10.1021/acs.est.2c07477

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Riley E Mulhern et al. Environ Sci Technol. 2023.

. 2023 Nov 21;57(46):17959-17970.

doi: 10.1021/acs.est.2c07477. Epub 2023 Mar 18.

Authors

Affiliations

¹ RTI International, Research Triangle Park, North Carolina 27709, United States.
² Environmental Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States.
³ Department of Civil, Construction, and Environmental Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States.

PMID: 36932953
PMCID: PMC10666530
DOI: 10.1021/acs.est.2c07477

Abstract

Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' F_β-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.

Keywords: children’s health; drinking water; lead; machine learning; risk assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Spatial distribution of child care centers sampled across NC through the Clean Water for Carolina Kids program and each facility’s maximum drinking water lead concentration.

**Figure 2**
AU-ROC and AU-PR scores for all eight models after 10-fold cross-validation and prediction of the test set. The box-and-whisker plots for the cross-validation performance measures represent the distribution of average scores from all 10 folds for each model; the gray shaded region shows the minimum to maximum range of performance across all models and all folds. The models predicting the maximum lead concentration in each facility performed slightly better than the models predicting the 90th percentile concentration.

**Figure 3**
Number and frequency of predictor variables included in the eight models tested. The Max ≥ 1 and P90 ≥ 15 model structures are shown as examples. Head Start status, the number of samples collected (a possible proxy for size of the center), the proportion of children with free/reduced lunch, and source water type were significant at every target concentration modeled. The structures for all eight models can be seen in Figure S7 in the Supporting Information, and interactive versions are available at www.cleanwaterforcarolinakids.org/publications/bn_models.

**Figure 4**
Distribution of average F_β-scores after a 10-fold cross-validation of each alternative heuristic to predict each target compared to BN scores. Panel A shows scores with precision and sensitivity weighted equally (β = 1). Panel B shows scores when considering sensitivity to be twice as important as precision (β = 2). F-scores for BN models show the maximum achievable score from varying classification thresholds.

**Figure 5**
Graphical representation of the sensitivity improvement and sampling reduction achievable by the Max ≥ 1 ppb BN model compared to each of the heuristics. The plot can be read as the proportion of all high-risk facilities in an area (sensitivity, x-axis) that could be identified with a given number of facilities sampled (predicted positives, y-axis). As the desired sensitivity increases, the number of facilities that would have to be sampled also increases. Sensitivity improvement is represented by the distance that each point is offset to the left of the BN model performance curve. Sampling reduction is shown by the distance that each point is offset above the curve. The curve outperforms each of the points. Similar plots for the other eight models can be seen in Figure S17.

**Figure 6**
Summary of the range of sensitivity improvement (Panel A) and sampling reduction (Panel B) that could be achieved by the BN models compared to each of the heuristics.

See this image and copyright information in PMC

References

1. EPA; HHS . Joint Training: Implementing a 3Ts Program for Lead Testing in Drinking Water in Early Childhood Program Facilities; 2022.
1. 117th Congress . Infrastructure Investment and Jobs Act. Section 50110: Lead Contamination in School Drinking Water; 135 Stat. 429, 2021. https://ballotpedia.org/Infrastructure_Investment_and_Jobs_Act_of_2021 and https://www.congress.gov/bill/117th-congress/house-bill/3684/text (accessed 2023-03-14).
1. Pakenham C.; Olson B.. How States Are Handling Lead in School Drinking Water. Education Leaders Report; 2021. Vol. 7, ( (1), ), pp 1–16.
1. USEPA . 40 CFR 141.92: Monitoring for lead in schools and child care facilities. https://www.ecfr.gov/current/title-40/chapter-I/subchapter-D/part-141/su... (accessed 2022-09-11).
1. Boyd G. R.; Pierson G. L.; Kirmeyer G. J.; English R. J. Lead Variability Testing in Seattle Public Schools. J. Am. Water Works Assoc 2008, 100 (2), 53–64. 10.1002/j.1551-8833.2008.tb08142.x. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Affiliations

Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous