Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 24;10(21):eadn1039.
doi: 10.1126/sciadv.adn1039. Epub 2024 May 23.

Machine learning-enhanced molecular network reveals global exposure to hundreds of unknown PFAS

Affiliations

Machine learning-enhanced molecular network reveals global exposure to hundreds of unknown PFAS

Xuebing Wang et al. Sci Adv. .

Abstract

Unknown forever chemicals like per- and polyfluoroalkyl substances (PFASs) are difficult to identify. Current platforms designed for metabolites and natural products cannot capture the diverse structural characteristics of PFAS. Here, we report an automatic PFAS identification platform (APP-ID) that screens for PFAS in environmental samples using an enhanced molecular network and identifies unknown PFAS structures using machine learning. Our networking algorithm, which enhances characteristic fragment matches, has lower false-positive rate (0.7%) than current algorithms (2.4 to 46%). Our support vector machine model identified unknown PFAS in test set with 58.3% accuracy, surpassing current software. Further, APP-ID detected 733 PFASs in real fluorochemical wastewater, 39 of which are previously unreported in environmental media. Retrospective screening of 126 PFASs against public data repository from 20 countries show PFAS substitutes are prevalent worldwide.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. The workflow diagram for the APP-ID method.
The APP-ID method workflow mainly includes (i) import sample MS data, (ii) seed PFAS annotation, (iii) PFAS_link module that searches for neighbor PFAS, (iv) PFAS_ID module that identifies the structure of PFAS, (v) iterative identification, and (vi) redundancy removal and confidence assignment.
Fig. 2.
Fig. 2.. Flink algorithm outperform traditional algorithms in PFAS molecular network.
(A) An example of Flink algorithm in distinguishing PFAS with non-PFAS under three scenarios. Scenario 1 represents the baseline without any treatment, while scenario 2 involves the exclusion of common neutral losses, including 0, CO2, and SO3. In scenario 3, common neutral losses and common fragments below 100, except for PFAS characteristic ions SO3F and SO2F, are excluded. (B) False-positive rate and positive rate of Flink algorithm under similarity threshold 0 to 1 across three scenarios using 60 PFAS spectra and 700 non-PFAS spectra. (C) False-positive rate and positive rate of seven spectra similarity algorithms under a similarity threshold of 0.5 using 60 PFAS spectra and 700 non-PFAS spectra. Compound pairs that have spectra similarity higher than similarity threshold were linked and positive rate stands for percentage of PFAS that could be linked with other PFAS in molecular network; false-positive rate for percentage of non-PFAS that could be linked with PFAS in molecular network.
Fig. 3.
Fig. 3.. Example and performance of PFAS_ID module.
(A) Candidate search workflow example. (B) Correct formula rate of transformation search using 409 PFAS, random stands for formula randomly selected from formula candidates. (C) Top 20 transformation frequencies for transformation search on 409 PFAS. (D) Fingerprint prediction model training process. (E) Model accuracy, precision, and recall F1 score comparison in predicting fingerprints on test dataset. SVM, support vector machine model; LOG, logistic model; BAY, for Bayesian model; DEC, decision tree model; RAN, random forest model; KNN, K-nearest neighbor model; ANN, artificial neural network. (F) Model F1 score comparison in predicting fingerprints on test dataset. (G) Correct identification rate of PFAS_ID, MetFrag 2.0, CMF-ID 3.0, and SIRIUS 5.5, while formula is known in test set, correct identification rate represents the percentage of correct structures in top one, top two, and top three ranks for 48 test PFAS. (H) Correct identification rate of PFAS_ID, MetFrag 2.0, CMF-ID 3.0, and SIRIUS 5.5 while formula is unknown.
Fig. 4.
Fig. 4.. Identification of PFAS in environmental samples.
(A) Seed PFAS identified in each round from influent wastewater sample. (B) Seed PFAS identified in each round from effluent wastewater sample. (C) Network diagram of PFAS in effluent as an example to show the distribution of PFAS in round 0, rounds 1 to 4, and rounds 5 to 9. (D) An example to demonstrate the identification of 20 PFAS from C2917 in network of influent, where combinations of “C” and numbers represent compound peak IDs. (E) The MS/MS spectra of C4432 and C28756. (F) The ranked candidates of C28756 given by PFAS_ID model. The formula, parent ion mass error, and rank score of each candidate were provided.
Fig. 5.
Fig. 5.. Retrospective screening of PFAS using MASST.
(A) Structures of 31 PFAS classes identified in fluorochemical wastewater. (B) Retrospective screening statistics for detected PFAS features in MassIVE datasets or files. (C) Distribution of PFAS features sample types using MASST retrospective screening. (D) Worldwide distribution of PFAS features using MASST retrospective screening.

References

    1. Trang B., Li Y., Xue X.-S., Ateia M., Houk K. N., Dichtel W. R., Low-temperature mineralization of perfluorocarboxylic acids. Science 377, 839–845 (2022). - PubMed
    1. Evich M. G., Davis M. J. B., McCord J. P., Acrey B., Awkerman J. A., Knappe D. R. U., Lindstrom A. B., Speth T. F., Tebes-Stevens C., Strynar M. J., Wang Z., Weber E. J., Henderson W. M., Washington J. W., Per- and polyfluoroalkyl substances in the environment. Science 375, eabg9065 (2022). - PMC - PubMed
    1. Zhu J.-J., Dressel W., Pacion K., Ren Z. J., ES&T in the 21st century: A data-driven analysis of research topics, interconnections, and trends in the past 20 years. Environ. Sci. Technol. 55, 3453–3464 (2021). - PubMed
    1. Sen P., Qadri S., Luukkonen P. K., Ragnarsdottir O., McGlinchey A., Jäntti S., Juuti A., Arola J., Schlezinger J. J., Webster T. F., Orešič M., Yki-Järvinen H., Hyötyläinen T., Exposure to environmental contaminants is associated with altered hepatic lipid metabolism in non-alcoholic fatty liver disease. J. Hepatol. 76, 283–293 (2022). - PubMed
    1. Bartell S. M., Vieira V. M., Critical review on PFOA, kidney cancer, and testicular cancer. J. Air Waste Manage. Assoc. 71, 663–679 (2021). - PubMed