Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 May;96(5):1279-1295.
doi: 10.1007/s00204-022-03252-y. Epub 2022 Mar 10.

Prediction reliability of QSAR models: an overview of various validation tools

Affiliations
Review

Prediction reliability of QSAR models: an overview of various validation tools

Priyanka De et al. Arch Toxicol. 2022 May.

Abstract

The reliability of any quantitative structure-activity relationship (QSAR) model depends on multiple aspects such as the accuracy of the input dataset, selection of significant descriptors, the appropriate splitting process of the dataset, statistical tools used, and most notably on the measures of validation. Validation, the most crucial step in QSAR model development, confirms the reliability of the developed QSAR models and the acceptability of each step in the model development. The present review deals with various validation tools that involve multiple techniques that improve the model quality and robustness. The double cross-validation tool helps in building improved quality models using different combinations of the same training set in an inner cross-validation loop. This exhaustive method is also integrated for small datasets (< 40 compounds) in another tool, namely the small dataset modeler tool. The main aim of QSAR researchers is to improve prediction quality by lowering the prediction errors for the query compounds. 'Intelligent' selection of multiple models and consensus predictions integrated in the intelligent consensus predictor tool were found to be more externally predictive than individual models. Furthermore, another tool called Prediction Reliability Indicator was explained to understand the quality of predictions for a true external set. This tool uses a composite scoring technique to identify query compounds as 'good' or 'moderate' or 'bad' predictions. We have also discussed a quantitative read-across tool which predicts a chemical response based on the similarity with structural analogues. The discussed tools are freely available from https://dtclab.webs.com/software-tools or http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/ and https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home (for read-across).

Keywords: Double cross-validation; Intelligent consensus prediction; QSAR; Read across; Small dataset modeling; Validation.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev 2(4):433–459 - DOI
    1. Ambure P, Cordeiro MNDS (2020) Importance of data curation in QSAR studies especially while modeling large-size datasets. In: Roy K (ed) Ecotoxicol QSARs. Springer, New York, pp 97–109 - DOI
    1. Ambure P, Gajewicz-Skretna A, Cordeiro MND, Roy K (2019) New workflow for QSAR model development from small data sets: small dataset curator and small dataset modeler integration of data curation, exhaustive double cross-validation, and a set of optimal model selection techniques. J Chem Inform Model 59(10):4070–4076 - DOI
    1. Bates S, Hastie T, Tibshirani R (2021) Cross-validation: what does it estimate and how well does it do it? arXiv:210400673
    1. Baumann D, Baumann K (2014) Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminform 6(1):1–19 - DOI

MeSH terms

LinkOut - more resources