Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 15:179:71-78.
doi: 10.1016/j.ecoenv.2019.04.035. Epub 2019 Apr 23.

QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods

Affiliations

QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods

Haixin Ai et al. Ecotoxicol Environ Saf. .

Abstract

Bioconcentration factors and median lethal concentrations (LC50s) are important when assessing risks posed by organic pollutants to aquatic ecosystems. Various quantitative structure-activity relationship models have been developed to predict bioconcentration factors and classify acute toxicity. In the study, we developed a regression model using Recursive Feature Elimination (RFE) method combined with the Support Vector Machine (SVM) algorithm. We calculated 2D molecular descriptors from a dataset containing 450 diverse chemicals in our regression model. Then we built three ensemble models using three machine learning algorithms and calculated 12 molecular fingerprints from a dataset containing 400 diverse chemicals in our classification models. In the regression model, the R2 and Rpred2 for the regression model were 0.860 and 0.757, respectively. Other parameters indicated that the regression model made good predictions and could efficiently predict a new set of compounds following standards set by Golbraikh, Tropsha, and Roy. In the classification models, the ensemble-SVM classification model gave an overall accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of 92.2, 95.1, 86.0, and 0.965, respectively, in a five-fold cross-validation and of 87.3, 92.6, 76.0, and 0.940, respectively, in an external validation. These parameters indicated that our ensemble-SVM model was more stable and gave more accurate predictions than previous models. The model could therefore be used to effectively predict aquatic toxicity and assess risks posed to aquatic ecosystems. We identified several structures most relevant to acute aquatic toxicity through predictions made by the two types of models, and this information may be important to aquatic toxicology experiments and aquatic system management.

Keywords: Acute aquatic toxicity; Aquatic toxicology; Assessing risks; Bioconcentration factors; Ensemble-SVM.

PubMed Disclaimer

MeSH terms

Substances

LinkOut - more resources