Prediction reliability of QSAR models: an overview of various validation tools
- PMID: 35267067
- DOI: 10.1007/s00204-022-03252-y
Prediction reliability of QSAR models: an overview of various validation tools
Abstract
The reliability of any quantitative structure-activity relationship (QSAR) model depends on multiple aspects such as the accuracy of the input dataset, selection of significant descriptors, the appropriate splitting process of the dataset, statistical tools used, and most notably on the measures of validation. Validation, the most crucial step in QSAR model development, confirms the reliability of the developed QSAR models and the acceptability of each step in the model development. The present review deals with various validation tools that involve multiple techniques that improve the model quality and robustness. The double cross-validation tool helps in building improved quality models using different combinations of the same training set in an inner cross-validation loop. This exhaustive method is also integrated for small datasets (< 40 compounds) in another tool, namely the small dataset modeler tool. The main aim of QSAR researchers is to improve prediction quality by lowering the prediction errors for the query compounds. 'Intelligent' selection of multiple models and consensus predictions integrated in the intelligent consensus predictor tool were found to be more externally predictive than individual models. Furthermore, another tool called Prediction Reliability Indicator was explained to understand the quality of predictions for a true external set. This tool uses a composite scoring technique to identify query compounds as 'good' or 'moderate' or 'bad' predictions. We have also discussed a quantitative read-across tool which predicts a chemical response based on the similarity with structural analogues. The discussed tools are freely available from https://dtclab.webs.com/software-tools or http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/ and https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home (for read-across).
Keywords: Double cross-validation; Intelligent consensus prediction; QSAR; Read across; Small dataset modeling; Validation.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Similar articles
-
How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?ACS Omega. 2018 Sep 19;3(9):11392-11406. doi: 10.1021/acsomega.8b01647. eCollection 2018 Sep 30. ACS Omega. 2018. PMID: 31459245 Free PMC article.
-
New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques.J Chem Inf Model. 2019 Oct 28;59(10):4070-4076. doi: 10.1021/acs.jcim.9b00476. Epub 2019 Sep 26. J Chem Inf Model. 2019. PMID: 31525295
-
Quick and efficient quantitative predictions of androgen receptor binding affinity for screening Endocrine Disruptor Chemicals using 2D-QSAR and Chemical Read-Across.Chemosphere. 2022 Dec;309(Pt 1):136579. doi: 10.1016/j.chemosphere.2022.136579. Epub 2022 Sep 26. Chemosphere. 2022. PMID: 36174732
-
On various metrics used for validation of predictive QSAR models with applications in virtual screening and focused library design.Comb Chem High Throughput Screen. 2011 Jul;14(6):450-74. doi: 10.2174/138620711795767893. Comb Chem High Throughput Screen. 2011. PMID: 21521150 Review.
-
Current approaches for choosing feature selection and learning algorithms in quantitative structure-activity relationships (QSAR).Expert Opin Drug Discov. 2018 Dec;13(12):1075-1089. doi: 10.1080/17460441.2018.1542428. Epub 2018 Nov 3. Expert Opin Drug Discov. 2018. PMID: 30372648 Review.
Cited by
-
Discovery of potential FGFR3 inhibitors via QSAR, pharmacophore modeling, virtual screening and molecular docking studies against bladder cancer.J Transl Med. 2023 Feb 10;21(1):111. doi: 10.1186/s12967-023-03955-5. J Transl Med. 2023. PMID: 36765337 Free PMC article.
-
Unveiling the antiviral inhibitory activity of ebselen and ebsulfur derivatives on SARS-CoV-2 using machine learning-based QSAR, LB-PaCS-MD, and experimental assay.Sci Rep. 2025 Feb 26;15(1):6956. doi: 10.1038/s41598-025-91235-1. Sci Rep. 2025. PMID: 40011571 Free PMC article.
-
Computational Tools to Facilitate Early Warning of New Emerging Risk Chemicals.Toxics. 2024 Oct 12;12(10):736. doi: 10.3390/toxics12100736. Toxics. 2024. PMID: 39453156 Free PMC article.
-
Toxicological evaluation of microbial secondary metabolites in the context of European active substance approval for plant protection products.Environ Health. 2024 Jun 4;23(1):52. doi: 10.1186/s12940-024-01092-0. Environ Health. 2024. PMID: 38835048 Free PMC article. Review.
-
ToxACoL: an endpoint-aware and task-focused compound representation learning paradigm for acute toxicity assessment.Nat Commun. 2025 Jul 1;16(1):5992. doi: 10.1038/s41467-025-60989-7. Nat Commun. 2025. PMID: 40593807 Free PMC article.
References
-
- Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev 2(4):433–459 - DOI
-
- Ambure P, Cordeiro MNDS (2020) Importance of data curation in QSAR studies especially while modeling large-size datasets. In: Roy K (ed) Ecotoxicol QSARs. Springer, New York, pp 97–109 - DOI
-
- Ambure P, Gajewicz-Skretna A, Cordeiro MND, Roy K (2019) New workflow for QSAR model development from small data sets: small dataset curator and small dataset modeler integration of data curation, exhaustive double cross-validation, and a set of optimal model selection techniques. J Chem Inform Model 59(10):4070–4076 - DOI
-
- Bates S, Hastie T, Tibshirani R (2021) Cross-validation: what does it estimate and how well does it do it? arXiv:210400673
-
- Baumann D, Baumann K (2014) Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminform 6(1):1–19 - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous