Beware of R(2): Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models
- PMID: 26099013
- PMCID: PMC4530125
- DOI: 10.1021/acs.jcim.5b00206
Beware of R(2): Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models
Abstract
The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R(2), as a measure of model fit and predictive power in QSAR and QSPR modeling. R(2) (or r(2)) has been used in various contexts in the literature in conjunction with training and test data for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha ( J. Mol. Graphics Modell. 2002 , 20 , 269 - 276 ) in a strict statistical manner. Shortcomings in these criteria are identified, and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data but rather to guide the application of R(2) as a model fit statistic. Examples are used to illustrate both correct and incorrect uses of R(2). Reporting the root-mean-square error or equivalent measures of dispersion, which are typically of more practical importance than R(2), is also encouraged, and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.
Figures



Similar articles
-
Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient.J Chem Inf Model. 2011 Sep 26;51(9):2320-35. doi: 10.1021/ci200211n. Epub 2011 Aug 12. J Chem Inf Model. 2011. PMID: 21800825
-
Rational selection of training and test sets for the development of validated QSAR models.J Comput Aided Mol Des. 2003 Feb-Apr;17(2-4):241-53. doi: 10.1023/a:1025386326946. J Comput Aided Mol Des. 2003. PMID: 13677490
-
Combinatorial QSAR of ambergris fragrance compounds.J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):582-95. doi: 10.1021/ci034203t. J Chem Inf Comput Sci. 2004. PMID: 15032539
-
How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR).SAR QSAR Environ Res. 2009;20(3-4):241-66. doi: 10.1080/10629360902949567. SAR QSAR Environ Res. 2009. PMID: 19544191 Review.
-
Prediction Accuracy of Production ADMET Models as a Function of Version: Activity Cliffs Rule.J Chem Inf Model. 2022 Jul 25;62(14):3275-3280. doi: 10.1021/acs.jcim.2c00699. Epub 2022 Jul 7. J Chem Inf Model. 2022. PMID: 35796226 Review.
Cited by
-
Construction of Quantitative Structure Activity Relationship (QSAR) Models to Predict Potency of Structurally Diversed Janus Kinase 2 Inhibitors.Molecules. 2019 Dec 1;24(23):4393. doi: 10.3390/molecules24234393. Molecules. 2019. PMID: 31805692 Free PMC article.
-
Untargeted LC-MS metabolomic studies of Asteraceae species to discover inhibitors of Leishmania major dihydroorotate dehydrogenase.Metabolomics. 2019 Apr 4;15(4):59. doi: 10.1007/s11306-019-1520-7. Metabolomics. 2019. PMID: 30949823
-
GRID-independent molecular descriptor analysis and molecular docking studies to mimic the binding hypothesis of γ-aminobutyric acid transporter 1 (GAT1) inhibitors.PeerJ. 2019 Jan 31;7:e6283. doi: 10.7717/peerj.6283. eCollection 2019. PeerJ. 2019. PMID: 30723616 Free PMC article.
-
Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era.AAPS J. 2018 Mar 30;20(3):58. doi: 10.1208/s12248-018-0210-0. AAPS J. 2018. PMID: 29603063 Free PMC article. Review.
-
NanoSolveIT Project: Driving nanoinformatics research to develop innovative and integrated tools for in silico nanosafety assessment.Comput Struct Biotechnol J. 2020 Mar 7;18:583-602. doi: 10.1016/j.csbj.2020.02.023. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32226594 Free PMC article. Review.
References
-
- Golbraikh A, Tropsha A. Beware of q2! J. Mol. Graph. Model. 2002;20:269–276. - PubMed
-
- Le T, Epa VC, Burden FR, Winkler DA. Quantitative Structure-Property Relationship Modeling of Diverse Materials Properties. Chem. Rev. 2012;112:2889–919. - PubMed
-
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2 ed. Springer; New York, USA: 2009. p. 745.
-
- Burden F, Winkler D. Bayesian Regularization of Neural Networks. Meth. Mol. Biol. 2008;458:25–44. - PubMed
-
- Burden FR, Winkler DA. Robust QSAR Models Using Bayesian Regularized Neural Networks. J. Med. Chem. 1999;42:3183–3187. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Miscellaneous