. 2017 Nov 28;9(1):60.

doi: 10.1186/s13321-017-0248-5.

Consensus queries in ligand-based virtual screening experiments

Francois Berenger^{1

2}, Oanh Vu³, Jens Meiler³

Affiliations

¹ Department of Chemistry, Vanderbilt University, Nashville, TN, USA. berenger@bioreg.kyushu-u.ac.jp.
² Division of System Cohort, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan. berenger@bioreg.kyushu-u.ac.jp.
³ Department of Chemistry, Vanderbilt University, Nashville, TN, USA.

PMID: 29185065
PMCID: PMC5705545
DOI: 10.1186/s13321-017-0248-5

Consensus queries in ligand-based virtual screening experiments

Francois Berenger et al. J Cheminform. 2017.

. 2017 Nov 28;9(1):60.

doi: 10.1186/s13321-017-0248-5.

Authors

Francois Berenger^{1

2}, Oanh Vu³, Jens Meiler³

Affiliations

¹ Department of Chemistry, Vanderbilt University, Nashville, TN, USA. berenger@bioreg.kyushu-u.ac.jp.
² Division of System Cohort, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan. berenger@bioreg.kyushu-u.ac.jp.
³ Department of Chemistry, Vanderbilt University, Nashville, TN, USA.

PMID: 29185065
PMCID: PMC5705545
DOI: 10.1186/s13321-017-0248-5

Abstract

Background: In ligand-based virtual screening experiments, a known active ligand is used in similarity searches to find putative active compounds for the same protein target. When there are several known active molecules, screening using all of them is more powerful than screening using a single ligand. A consensus query can be created by either screening serially with different ligands before merging the obtained similarity scores, or by combining the molecular descriptors (i.e. chemical fingerprints) of those ligands.

Results: We report on the discriminative power and speed of several consensus methods, on two datasets only made of experimentally verified molecules. The two datasets contain a total of 19 protein targets, 3776 known active and ~ 2 × 10⁶ inactive molecules. Three chemical fingerprints are investigated: MACCS 166 bits, ECFP4 2048 bits and an unfolded version of MOLPRINT2D. Four different consensus policies and five consensus sizes were benchmarked.

Conclusions: The best consensus method is to rank candidate molecules using the maximum score obtained by each candidate molecule versus all known actives. When the number of actives used is small, the same screening performance can be approached by a consensus fingerprint. However, if the computational exploration of the chemical space is limited by speed (i.e. throughput), a consensus fingerprint allows to outperform this consensus of scores.

Keywords: Chemical fingerprint; Consensus query; ECFP4; Ligand-based virtual screening (LBVS); MACCS; MOLPRINT2D; Potency scaling; Several bioactives; Similarity search; Tanimoto score.

PubMed Disclaimer

Figures

**Fig. 1**
A consensus fingerprint is created by combining the fingerprints of several known active molecules. The way to combine fingerprints is controlled by the consensus policy

**Fig. 2**
Effect of the consensus size on the consensus query global classification performance (AUC) and early recovery capability ( ${PM}_{1 %}$ ). Experiment: HTS dataset PubChem SAID 463087, ECFP4 fingerprint and optimist consensus. Values shown are medians over 100 experiments ± 1 median absolute deviation [37]

**Fig. 3**
Speed comparison between the opportunist (oppo) and optimist (opti) consensus policies on ECFP4 fingerprints as a function of the consensus size

**Fig. 4**
Cumulative distribution functions of AUC values for consensus of sizes two, five and ten. The consensus was built using MACCS fingerprints in the left column, ECFP4 fingerprints in the middle and UMOP2D fingerprints on the right. The lower a curve is, the better the corresponding method. The vertical gray bar at AUC 0.5 allows to find the least random method (lowest curve)

**Fig. 5**
Cumulative distribution functions of ${PM}_{10 %}$ values for consensus of sizes two, five and ten. The consensus was built using MACCS fingerprints in the left column, ECFP4 fingerprints in the middle and UMOP2D fingerprints on the right. The lower a curve is, the better the corresponding method

**Fig. 6**
Cumulative distribution functions of AUC values on the MLQSAR dataset. Cf. Fig. 4 for legend details

**Fig. 7**
Cumulative distribution functions of ${PM}_{10 %}$ values on the MLQSAR dataset. Cf. Fig. 5 for legend details

**Fig. 8**
Effect of applying potency scaling (knowledgeable policy) or not (realist policy) to build the consensus. The median change in rank is shown over 1000 experiments for all actives of the NRLIST PR-target (NRLIST target with the most active ligands) and MACCS fingerprints. Active molecules are ranked ordered from most to least potent (left to right)

**Fig. 9**
CPU-bounded experiment on the NRLIST AR- target (left) and PubChem SAID 485290 (right)

See this image and copyright information in PMC

References

1. Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–1338. doi: 10.1126/science.aab3050. - DOI - PubMed
1. Altae-Tran H, Ramsundar B, Pappu AS, Pande V. Low data drug discovery with one-shot learning. ACS Cent Sci. 2017;3(4):283–293. doi: 10.1021/acscentsci.6b00367. - DOI - PMC - PubMed
1. Johnson MA, Maggiora GM. Concepts and applications of molecular similarity. New York: Wiley; 1990.
1. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2:3204–3218. doi: 10.1039/b409813g. - DOI - PubMed
1. Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006;11(23):1046–1053. doi: 10.1016/j.drudis.2006.10.005. - DOI - PubMed

Grants and funding

R01 GM099842/NH/NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Consensus queries in ligand-based virtual screening experiments

Affiliations

Consensus queries in ligand-based virtual screening experiments

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources