Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 28;9(1):60.
doi: 10.1186/s13321-017-0248-5.

Consensus queries in ligand-based virtual screening experiments

Affiliations

Consensus queries in ligand-based virtual screening experiments

Francois Berenger et al. J Cheminform. .

Abstract

Background: In ligand-based virtual screening experiments, a known active ligand is used in similarity searches to find putative active compounds for the same protein target. When there are several known active molecules, screening using all of them is more powerful than screening using a single ligand. A consensus query can be created by either screening serially with different ligands before merging the obtained similarity scores, or by combining the molecular descriptors (i.e. chemical fingerprints) of those ligands.

Results: We report on the discriminative power and speed of several consensus methods, on two datasets only made of experimentally verified molecules. The two datasets contain a total of 19 protein targets, 3776 known active and ~ 2 × 106 inactive molecules. Three chemical fingerprints are investigated: MACCS 166 bits, ECFP4 2048 bits and an unfolded version of MOLPRINT2D. Four different consensus policies and five consensus sizes were benchmarked.

Conclusions: The best consensus method is to rank candidate molecules using the maximum score obtained by each candidate molecule versus all known actives. When the number of actives used is small, the same screening performance can be approached by a consensus fingerprint. However, if the computational exploration of the chemical space is limited by speed (i.e. throughput), a consensus fingerprint allows to outperform this consensus of scores.

Keywords: Chemical fingerprint; Consensus query; ECFP4; Ligand-based virtual screening (LBVS); MACCS; MOLPRINT2D; Potency scaling; Several bioactives; Similarity search; Tanimoto score.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A consensus fingerprint is created by combining the fingerprints of several known active molecules. The way to combine fingerprints is controlled by the consensus policy
Fig. 2
Fig. 2
Effect of the consensus size on the consensus query global classification performance (AUC) and early recovery capability (PM1%). Experiment: HTS dataset PubChem SAID 463087, ECFP4 fingerprint and optimist consensus. Values shown are medians over 100 experiments ± 1 median absolute deviation [37]
Fig. 3
Fig. 3
Speed comparison between the opportunist (oppo) and optimist (opti) consensus policies on ECFP4 fingerprints as a function of the consensus size
Fig. 4
Fig. 4
Cumulative distribution functions of AUC values for consensus of sizes two, five and ten. The consensus was built using MACCS fingerprints in the left column, ECFP4 fingerprints in the middle and UMOP2D fingerprints on the right. The lower a curve is, the better the corresponding method. The vertical gray bar at AUC 0.5 allows to find the least random method (lowest curve)
Fig. 5
Fig. 5
Cumulative distribution functions of PM10% values for consensus of sizes two, five and ten. The consensus was built using MACCS fingerprints in the left column, ECFP4 fingerprints in the middle and UMOP2D fingerprints on the right. The lower a curve is, the better the corresponding method
Fig. 6
Fig. 6
Cumulative distribution functions of AUC values on the MLQSAR dataset. Cf. Fig. 4 for legend details
Fig. 7
Fig. 7
Cumulative distribution functions of PM10% values on the MLQSAR dataset. Cf. Fig. 5 for legend details
Fig. 8
Fig. 8
Effect of applying potency scaling (knowledgeable policy) or not (realist policy) to build the consensus. The median change in rank is shown over 1000 experiments for all actives of the NRLIST PR-target (NRLIST target with the most active ligands) and MACCS fingerprints. Active molecules are ranked ordered from most to least potent (left to right)
Fig. 9
Fig. 9
CPU-bounded experiment on the NRLIST AR- target (left) and PubChem SAID 485290 (right)

References

    1. Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–1338. doi: 10.1126/science.aab3050. - DOI - PubMed
    1. Altae-Tran H, Ramsundar B, Pappu AS, Pande V. Low data drug discovery with one-shot learning. ACS Cent Sci. 2017;3(4):283–293. doi: 10.1021/acscentsci.6b00367. - DOI - PMC - PubMed
    1. Johnson MA, Maggiora GM. Concepts and applications of molecular similarity. New York: Wiley; 1990.
    1. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2:3204–3218. doi: 10.1039/b409813g. - DOI - PubMed
    1. Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006;11(23):1046–1053. doi: 10.1016/j.drudis.2006.10.005. - DOI - PubMed

LinkOut - more resources