Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 16:11:571009.
doi: 10.3389/fmicb.2020.571009. eCollection 2020.

Monitoring of Nitrification in Chloraminated Drinking Water Distribution Systems With Microbiome Bioindicators Using Supervised Machine Learning

Affiliations

Monitoring of Nitrification in Chloraminated Drinking Water Distribution Systems With Microbiome Bioindicators Using Supervised Machine Learning

Vicente Gomez-Alvarez et al. Front Microbiol. .

Abstract

Many drinking water utilities in the United States using chloramine as disinfectant treatment in their drinking water distribution systems (DWDS) have experienced nitrification episodes, which detrimentally impact the water quality. Identification of potential predictors of nitrification in DWDS may be used to optimize current nitrification monitoring plans and ultimately helps to safeguard drinking water and public health. In this study, we explored the water microbiome from a chloraminated DWDS simulator operated through successive operational schemes of stable and nitrification events and utilized the 16S rRNA gene dataset to generate high-resolution taxonomic profiles for bioindicator discovery. Analysis of the microbiome revealed both an enrichment and depletion of various bacterial populations associated with nitrification. A supervised machine learning approach (naïve Bayes classifier) trained with bioindicator profiles (membership and structure) were used to classify water samples. Performance of each model was examined using the area under the curve (AUC) from the receiver-operating characteristic (ROC) and precision-recall (PR) curves. The ROC- and PR-AUC gradually increased to 0.778 and 0.775 when genus-level membership (i.e., presence and absence) was used in the model and increased significantly using structure (i.e., distribution) dataset (AUCs = 1.000, p < 0.01). Community structure significantly improved the predictive ability of the model beyond that of membership only regardless of the type of data (sequence- or taxonomy-based model) we used to represent the microbiome. In comparison, an ATP-based model (bulk biomass) generated a lower AUCs of 0.477 and 0.553 (ROC and PR, respectively), which is equivalent to a random classification. A combination of eight bioindicators was able to correctly classify 85% of instances (nitrification or stable events) with an AUC of 0.825 (sensitivity: 0.729, specificity: 0.894) on a full-scale DWDS test set. Abiotic-based model using total Chlorine/NH2Cl and NH3 generated AUCs of 0.740 and 0.861 (ROC and PR, respectively), corresponding to a sensitivity of 0.250 and a specificity of 0.957. The AUCs increased to > 0.946 with the addition of NO2 - concentration, which is indicative of nitrification in the DWDS. This research provides evidence of the feasibility of using bioindicators to predict operational failures in the system (e.g., nitrification).

Keywords: bioindicators; machine learning; microbiome; nitrification; receiver-operating characteristic.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Machine learning classification for the prediction of drinking water distribution systems (DWDS). Schematic of a supervised machine learning approach to classify operational schemes. Different color arrows indicate training (formula image) and test (formula image) datasets.
FIGURE 2
FIGURE 2
Water parameters distinguish between operational schemes. (A) Principal component (PC) analysis representing the relationship of drinking water distribution systems (DWDS) simulator samples. Values in parenthesis indicate the percentage of total variation explained by the first two axes. Dashed arrows indicate the orientation and contribution of water parameters to the ordination plot. Labeled samples represent transition points. Bulk water (BW) source: SS (formula image), SF (formula image), SR (formula image). Water quality values are listed in Supplementary Table S1. (B) The simulator was operated through four successive operational schemes; a stable period (SI) where chloramine residual (formula image) is maintained to a failure period (SF) where no chloramine residual is maintained as a result of nitrification (formula image), followed by a ‘chlorine burn’ (SR) by switching disinfectant from chloramine to free chlorine (formula image) and switching back to chloramine resuming normal operation (SII).
FIGURE 3
FIGURE 3
Microbial assemblages and bioindicator discovery. (A) Principal coordinate analysis (PCoA) ordination plot based on Jensen–Shannon dissimilarity of 16S rRNA operational taxonomic unit (OTU)-level bacterial profiles (cutoff = 0.03). Values in parenthesis indicate the percentage of total variation explained by the first two axes. Samples: Stable (SS, formula image), Failure (SF, formula image). (B) Identification of statistically significant genus-level assigned OTU bioindicators using linear discriminative analysis (LDA) effect size (LEfSe) analyses (LDA score > 4.0, p < 0.0001). Negative LDA scores are enriched in SF while positive LDA scores are enriched in SS events. (C) Receiver operating characteristic (ROC) and (D) Precision-recall (PR) curves with area under the curve (AUC) values and 95% confidence intervals in parenthesis for predictive model comparing biomass (ATP, formula image) and microbial bioindicators based on community membership [OTU (M), formula image] and structure OTU (S), formula image] data. Dashed lines indicate the null model. Samples: biomass, n = 32; OTU, n = 48.
FIGURE 4
FIGURE 4
Classification performance on full-scale drinking water distribution systems (DWDS). (A) Receiver operating characteristic (ROC) and (B) Precision-recall (PR) curves with area under the curve (AUC) values and 95% confidence intervals in parenthesis for predictive model comparing microbial bioindicators based on community structure (genus-level taxonomy: RDP taxonomic database, formula image) and water quality (parameters: NH2Cl + NH3, formula image; NH2Cl + NH3 + NO2, formula image) data. Dashed lines indicate the null model. Samples: Failure, n = 48; Stable, n = 113.

Similar articles

Cited by

References

    1. Anderson M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral. Ecol. 26 32–46. 10.1111/j.1442-9993.2001.01070.pp.x - DOI
    1. Balvočiūtė M., Huson D. H. (2017). SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare? BMC Genomics 18:114. 10.1186/s12864-017-3501-4 - DOI - PMC - PubMed
    1. Bartell S. M. (2006). Biomarkers, bioindicators, and ecological risk assessment-a brief review and evaluation. Environ. Bioind. 1 39–52.
    1. Bautista-de los Santos Q. M., Schroeder J. L., Sevillano-Rivera M. C., Sungthong R., Ijaz U. Z., Sloan W. T., et al. (2016). Emerging investigators series: microbial communities in full-scale drinking water distribution systems – a meta-analysis. Environ. Sci. 2 631–644. 10.1039/c6ew00030d - DOI
    1. Brodersen K. H., Ong C. S., Stephan K. E., Buhmann J. M. (2010). “The Binormal Assumption on Precision-Recall Curves,” in Proceedings of the 20th International Conference on Pattern Recognition, (Washington, DC: Institute of Electrical and Electronics Engineers; ), 4263–4266.