Review

. 2020 May 21;21(3):791-802.

doi: 10.1093/bib/bbz026.

Validation strategies for target prediction methods

Neann Mathai^{1

2

3}, Ya Chen³, Johannes Kirchmair^{1

2

3}

Affiliations

¹ Department of Chemistry, University of Bergen, Bergen, Norway.
² Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.
³ Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany.

PMID: 31220208
PMCID: PMC7299289
DOI: 10.1093/bib/bbz026

Review

Validation strategies for target prediction methods

Neann Mathai et al. Brief Bioinform. 2020.

. 2020 May 21;21(3):791-802.

doi: 10.1093/bib/bbz026.

Authors

Neann Mathai^{1

2

3}, Ya Chen³, Johannes Kirchmair^{1

2

3}

Affiliations

¹ Department of Chemistry, University of Bergen, Bergen, Norway.
² Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.
³ Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany.

PMID: 31220208
PMCID: PMC7299289
DOI: 10.1093/bib/bbz026

Abstract

Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.

Keywords: classification; data bias; model validation; performance metrics; polypharmacology; target prediction.

PubMed Disclaimer

Figures

**Figure 1**
Illustrations of example data partitioning schemes: (A) a single train–test split, (B) a single train–test split of chronological data, (C) a 5-fold CV scheme, (D) a single train–test split into construction and validation sets for internal validation and an external testing set for external validation, (E) a 4-fold CV scheme used for internal validation with a testing set reserved for external validation and (F) a nested CV scheme with a 2-fold loop for internal validation and a 3-fold loop for external validation.

**Figure 2**
Examples of CV-testing folds designed to have (A) all data points involving specific queries within 1-fold (points inside the purple box), (B) all data points involving specific targets within 1-fold (points inside the purple box) and (C) all data points involving the components of query compounds–target pairs within one testing fold (points inside the purple boxes). The data points covered by the blue boxes are omitted from both training and testing data during the CV round involving the purple boxed data as the testing set, and the remaining data points are used as the training set. Interacting pairs are shown in green while (putative) non-interacting pairs are shown in white (adapted from Pahikkala *et al.* [25]).

**Figure 3**
(A) A binary classification confusion matrix with the four categories of prediction (FPs may include putative false positives); (B) ROC curves: the closer the curves are to the top left-hand corner, the better. AUC values alone may be deceptive as a lack of correct early predictions may be offset by an increased number of correct predictions later, leading to high AUC values. This scenario is shown by the green and purple curves. (C) Precision-recall curve: the closer the curve is to the top right corner, the better the model’s performance.

**Figure 4**
Success rates for a target prediction model (e.g. percentage of compounds for which at least one known target was ranked among the top 1, top 3 and top 5 positions) versus the maximum similarity between the individual query compounds and their closest related compounds in the reference data. Such plots are powerful tools to visualize a method’s capacity for inter- and extrapolation and help with the definition of the applicability domain.

See this image and copyright information in PMC

Cited by

Integrating Artificial Intelligence for Drug Discovery in the Context of Revolutionizing Drug Delivery.
Visan AI, Negut I. Visan AI, et al. Life (Basel). 2024 Feb 7;14(2):233. doi: 10.3390/life14020233. Life (Basel). 2024. PMID: 38398742 Free PMC article. Review.
In silico proof of principle of machine learning-based antibody design at unconstrained scale.
Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, Scheffer L, Chernigovskaya M, Snapkov I, Slabodkin A, Mehta BB, Miho E, Lund-Johansen F, Andersen JT, Hochreiter S, Hobæk Haff I, Klambauer G, Sandve GK, Greiff V. Akbar R, et al. MAbs. 2022 Jan-Dec;14(1):2031482. doi: 10.1080/19420862.2022.2031482. MAbs. 2022. PMID: 35377271 Free PMC article.
Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands.
Chen Y, Mathai N, Kirchmair J. Chen Y, et al. J Chem Inf Model. 2020 Jun 22;60(6):2858-2875. doi: 10.1021/acs.jcim.0c00161. Epub 2020 May 5. J Chem Inf Model. 2020. PMID: 32368908 Free PMC article.
Identification and Validation of Carbonic Anhydrase II as the First Target of the Anti-Inflammatory Drug Actarit.
Ghislat G, Rahman T, Ballester PJ. Ghislat G, et al. Biomolecules. 2020 Nov 19;10(11):1570. doi: 10.3390/biom10111570. Biomolecules. 2020. PMID: 33227945 Free PMC article.
Novel drug-target interactions via link prediction and network embedding.
Amiri Souri E, Laddach R, Karagiannis SN, Papageorgiou LG, Tsoka S. Amiri Souri E, et al. BMC Bioinformatics. 2022 Apr 4;23(1):121. doi: 10.1186/s12859-022-04650-w. BMC Bioinformatics. 2022. PMID: 35379165 Free PMC article.

See all "Cited by" articles

References

1. Moffat JG, Vincent F, Lee JA, et al. . Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat Rev Drug Discov 2017;16:531–543. - PubMed
1. Chaudhari R, Tan Z, Huang B, et al. . Computational polypharmacology: a new paradigm for drug discovery. Expert Opin Drug Discov 2017;12:279–291. - PMC - PubMed
1. Reddy AS, Zhang S. Polypharmacology: drug discovery for the future. Expert Rev Clin Pharmacol 2013;6:41–47. - PMC - PubMed
1. Anighoro A, Bajorath J, Rastelli G. Polypharmacology: challenges and opportunities in drug discovery. J Med Chem 2014;57:7874–7887. - PubMed
1. Proschak E, Stark H, Merk D. Polypharmacology by design: a medicinal chemist’s perspective on multitargeting compounds. J Med Chem 2019;62:420–444. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Validation strategies for target prediction methods

Affiliations

Validation strategies for target prediction methods

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources