Novel application of a statistical technique, Random Forests, in a bacterial source tracking study
- PMID: 20566209
- DOI: 10.1016/j.watres.2010.05.019
Novel application of a statistical technique, Random Forests, in a bacterial source tracking study
Abstract
In this study, data from bacterial source tracking (BST) analysis using antibiotic resistance profiles were examined using two statistical techniques, Random Forests (RF) and discriminant analysis (DA) to determine sources of fecal contamination of a Texas water body. Cow Trap and Cedar Lakes are potential oyster harvesting waters located in Brazoria County, Texas, that have been listed as impaired for bacteria on the 2004 Texas 303(d) list. Unknown source Escherichia coli were isolated from water samples collected in the study area during two sampling events. Isolates were confirmed as E. coli using carbon source utilization profiles and then analyzed via ARA, following the Kirby-Bauer disk diffusion method. Zone diameters from ARA profiles were analyzed with both DA and RF. Using a two-way classification (human vs nonhuman), both DA and RF categorized over 90% of the 299 unknown source isolates as a nonhuman source. The average rates of correct classification (ARCCs) for the library of 1172 isolates using DA and RF were 74.6% and 82.3%, respectively. ARCCs from RF ranged from 7.7 to 12.0% higher than those from DA. Rates of correct classification (RCCs) for individual sources classified with RF ranged from 23.2 to 0.2% higher than those of DA, with a mean difference of 9.0%. Additional evidence for the outperformance of DA by RF was found in the comparison of training and test set ARCCs and examination of specific disputed isolates; RF produced higher ARCCs (ranging from 8 to 13% higher) than DA for all 1000 trials (excluding the two-way classification, in which RF outperformed DA 999 out of 1000 times). This is of practical significance for analysis of bacterial source tracking data. Overall, based on both DA and RF results, migratory birds were found to be the source of the largest portion of the unknown E. coli isolates. This study is the first known published application of Random Forests in the field of BST.
Copyright 2010 Elsevier Ltd. All rights reserved.
Similar articles
-
Direct comparison of four bacterial source tracking methods and use of composite data sets.J Appl Microbiol. 2007 Aug;103(2):350-64. doi: 10.1111/j.1365-2672.2006.03246.x. J Appl Microbiol. 2007. PMID: 17650195
-
Considerations when using discriminant function analysis of antimicrobial resistance profiles to identify sources of fecal contamination of surface water in Michigan.Appl Environ Microbiol. 2007 May;73(9):2878-90. doi: 10.1128/AEM.02376-06. Epub 2007 Mar 2. Appl Environ Microbiol. 2007. PMID: 17337537 Free PMC article.
-
Choice of indicator organism and library size considerations for phenotypic microbial source tracking by FAME profiling.Water Sci Technol. 2009;60(10):2659-68. doi: 10.2166/wst.2009.656. Water Sci Technol. 2009. PMID: 19923772
-
Performance, design, and analysis in microbial source tracking studies.Appl Environ Microbiol. 2007 Apr;73(8):2405-15. doi: 10.1128/AEM.02473-06. Epub 2007 Feb 16. Appl Environ Microbiol. 2007. PMID: 17308193 Free PMC article. Review. No abstract available.
-
Monitoring bacterial pathogens in the environment: advantages of a multilayered approach.Curr Opin Biotechnol. 2003 Jun;14(3):319-25. doi: 10.1016/s0958-1669(03)00069-7. Curr Opin Biotechnol. 2003. PMID: 12849786 Review.
Cited by
-
EXPERT: transfer learning-enabled context-aware microbial community classification.Brief Bioinform. 2022 Nov 19;23(6):bbac396. doi: 10.1093/bib/bbac396. Brief Bioinform. 2022. PMID: 36124759 Free PMC article.
-
Evaluating machine learning-powered classification algorithms which utilize variants in the GCKR gene to predict metabolic syndrome: Tehran Cardio-metabolic Genetics Study.J Transl Med. 2022 Apr 9;20(1):164. doi: 10.1186/s12967-022-03349-z. J Transl Med. 2022. PMID: 35397593 Free PMC article.
-
Microbial source tracking using metagenomics and other new technologies.J Microbiol. 2021 Mar;59(3):259-269. doi: 10.1007/s12275-021-0668-9. Epub 2021 Feb 10. J Microbiol. 2021. PMID: 33565053 Review.
-
Tracking antibiotic resistance gene pollution from different sources using machine-learning classification.Microbiome. 2018 May 24;6(1):93. doi: 10.1186/s40168-018-0480-x. Microbiome. 2018. PMID: 29793542 Free PMC article.
-
Ecological dynamics imposes fundamental challenges in community-based microbial source tracking.Imeta. 2023 Jan 5;2(1):e75. doi: 10.1002/imt2.75. eCollection 2023 Feb. Imeta. 2023. PMID: 38868341 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical