Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 10;11(1):4.
doi: 10.1186/s13321-018-0325-4.

Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Affiliations

Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Nicolas Bosc et al. J Cheminform. .

Abstract

Structure-activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a "real world" application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.

Keywords: ChEMBL; Cheminformatics; Classification models; Mondrian conformal prediction; QSAR.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schema of the data collection from ChEMBL
Fig. 2
Fig. 2
Percentage of the 550 selected targets by protein families. The protein family colours are the same for all the figures
Fig. 3
Fig. 3
Mean CCR of the 550 QSAR models grouped by protein family
Fig. 4
Fig. 4
Overall sensitivity, specificity and CCR for the 550 conformal predictors at different confidence levels. Results show the performance according to whether the ‘both’ predictions are included or excluded from the calculation
Fig. 5
Fig. 5
Sensitivity (a) and specificity (b) versus the ratio of active to inactive compounds for each QSAR models. Colours represent the protein families as described in the legend of the Fig. 3
Fig. 6
Fig. 6
CCR comparison between results of QSAR and MCP models at 80% (a, b), and 90% (c, d). In a, c The ‘both’ class prediction is included for model evaluation while it is left-out in (b, d). The targets are divided in four quadrans depending on whether they have good results for both MCP and QSAR (upper-right), either MCP (upper-left) or QSAR (bottom-right), or none of them (bottom-left)
Fig. 7
Fig. 7
Evolution of the MCP performance depending on the confidence level for hERG
Fig. 8
Fig. 8
Performance of the MCP models on the temporal validation set at different confidence levels. The results show the performance according to whether the ‘both’ predictions are included or excluded from the calculation
Fig. 9
Fig. 9
Comparison of the compound assignments in the uncertain class for MCP (at 80% confidence level) with QSAR for a the inactive and b the active compounds. The pink set represents the molecules (active or inactive) that are correctly predicted by QSAR, the green set represents the uncertain predictions from MCP and the brown set is the intersection between the sets, that is to say, the molecules predicted as uncertain by MCP but correctly predicted by QSAR

References

    1. Cherkasov A, Muratov EN, Fourches D, et al. QSAR modeling: Where have you been? Where are you going to? J Med Chem. 2014;57:4977–5010. - PMC - PubMed
    1. Nicola G, Liu T, Gilson MK. Public domain databases for medicinal chemistry. J Med Chem. 2012;55:6987–7002. - PMC - PubMed
    1. Mendez D, Gaulton A, Bento AP, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2018 doi: 10.1093/nar/gky1075. - DOI - PMC - PubMed
    1. Verma J, Khedkar V, Coutinho E. 3D-QSAR in drug design: a review. Curr Top Med Chem. 2010;10:95–115. - PubMed
    1. Quintero FA, Patel SJ, Muñoz F, Sam Mannan M. Review of existing QSAR/QSPR models developed for properties used in hazardous chemicals classification system. Ind Eng Chem Res. 2012;51:16101–16115.

LinkOut - more resources