Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 16;59(36):19307-19317.
doi: 10.1021/acs.est.5c07541. Epub 2025 Sep 5.

Deconvoluting and Interpreting Nontargeted Chemical Data: A Data-Driven Forensic Workflow for Identifying the Most Prominent Chemical Sources in Receiving Waters

Affiliations

Deconvoluting and Interpreting Nontargeted Chemical Data: A Data-Driven Forensic Workflow for Identifying the Most Prominent Chemical Sources in Receiving Waters

Cheng Shi et al. Environ Sci Technol. .

Abstract

Chemical forensics aims to identify major contamination sources, but existing workflows often rely on predefined targets and known sources, introducing bias. Here, we present a data-driven workflow that reduces this bias by applying an unsupervised machine learning technique. We applied both nonmetric multidimensional scaling (NMDS) and non-negative matrix factorization (NMF) on the same nontargeted chemical data set to compare their different interpretations of environmental sources. Weekly nontargeted data was collected from the Fall Creek Monitoring Station (Ithaca, NY), where daily samples were previously analyzed using source-defined models. NMF was first used to decompose the full nontargeted chemical data set into a small set of chemical factors representing distinct composition profiles. Each factor was then interpreted through (1) Spearman correlations with watershed characteristics (e.g., temperature, flow) and (2) suspect screening of high-weighted nontargeted features. In addition to confirming known anthropogenic inputs, our analysis revealed potential novel sources associated with snowmelt, groundwater seepage, and seasonal hydrological dynamics. We also detected an annual shift in the chemical composition, highlighting the evolving influence of these sources. This workflow enables watershed managers to move beyond predefined sources, detect both known and emerging chemical contributors, and apply adaptive, evidence-based strategies to protect water quality under changing conditions.

Keywords: chemical forensics; non-negative matrix factorization; nontargeted analysis; unsupervised learning; watershed health.

PubMed Disclaimer

Substances

LinkOut - more resources