The chemical space project
- PMID: 25687211
- DOI: 10.1021/ar500432k
The chemical space project
Abstract
One of the simplest questions that can be asked about molecular diversity is how many organic molecules are possible in total? To answer this question, my research group has computationally enumerated all possible organic molecules up to a certain size to gain an unbiased insight into the entire chemical space. Our latest database, GDB-17, contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens, by far the largest small molecule database reported to date. Molecules allowed by valency rules but unstable or nonsynthesizable due to strained topologies or reactive functional groups were not considered, which reduced the enumeration by at least 10 orders of magnitude and was essential to arrive at a manageable database size. Despite these restrictions, GDB-17 is highly relevant with respect to known molecules. Beyond enumeration, understanding and exploiting GDBs (generated databases) led us to develop methods for virtual screening and visualization of very large databases in the form of a "periodic system of molecules" comprising six different fingerprint spaces, with web-browsers for nearest neighbor searches, and the MQN- and SMIfp-Mapplet application for exploring color-coded principal component maps of GDB and other large databases. Proof-of-concept applications of GDB for drug discovery were realized by combining virtual screening with chemical synthesis and activity testing for neurotransmitter receptor and transporter ligands. One surprising lesson from using GDB for drug analog searches is the incredible depth of chemical space, that is, the fact that millions of very close analogs of any molecule can be readily identified by nearest-neighbor searches in the MQN-space of the various GDBs. The chemical space project has opened an unprecedented door on chemical diversity. Ongoing and yet unmet challenges concern enumerating molecules beyond 17 atoms and synthesizing GDB molecules with innovative scaffolds and pharmacophores.
Similar articles
-
MQN-mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13.J Chem Inf Model. 2013 Feb 25;53(2):509-18. doi: 10.1021/ci300513m. Epub 2013 Jan 22. J Chem Inf Model. 2013. PMID: 23297797
-
SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules.J Chem Inf Model. 2013 Aug 26;53(8):1979-89. doi: 10.1021/ci400206h. Epub 2013 Jul 30. J Chem Inf Model. 2013. PMID: 23845040
-
Visualization and virtual screening of the chemical universe database GDB-17.J Chem Inf Model. 2013 Jan 28;53(1):56-65. doi: 10.1021/ci300535x. Epub 2013 Jan 9. J Chem Inf Model. 2013. PMID: 23259841
-
Exploring the chemical space of known and unknown organic small molecules at www.gdb.unibe.ch.Chimia (Aarau). 2011;65(11):863-7. doi: 10.2533/chimia.2011.863. Chimia (Aarau). 2011. PMID: 22289373 Review.
-
The Generated Databases (GDBs) as a Source of 3D-shaped Building Blocks for Use in Medicinal Chemistry and Drug Discovery.Chimia (Aarau). 2020 Apr 29;74(4):241-246. doi: 10.2533/chimia.2020.241. Chimia (Aarau). 2020. PMID: 32331540 Review.
Cited by
-
Evaluation of 3-Dimensionality in Approved and Experimental Drug Space.ACS Med Chem Lett. 2020 May 18;11(6):1292-1298. doi: 10.1021/acsmedchemlett.0c00121. eCollection 2020 Jun 11. ACS Med Chem Lett. 2020. PMID: 32551014 Free PMC article.
-
One-pot parallel synthesis of 1,3,5-trisubstituted 1,2,4-triazoles.Mol Divers. 2022 Apr;26(2):993-1004. doi: 10.1007/s11030-021-10218-2. Epub 2021 Apr 2. Mol Divers. 2022. PMID: 33797670 Free PMC article.
-
Quantum Chemistry Calculations for Metabolomics.Chem Rev. 2021 May 26;121(10):5633-5670. doi: 10.1021/acs.chemrev.0c00901. Epub 2021 May 12. Chem Rev. 2021. PMID: 33979149 Free PMC article. Review.
-
Artificial Intelligence-Guided De Novo Molecular Design Targeting COVID-19.ACS Omega. 2021 May 4;6(19):12557-12566. doi: 10.1021/acsomega.1c00477. eCollection 2021 May 18. ACS Omega. 2021. PMID: 34056406 Free PMC article.
-
BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry.Mol Inform. 2016 Dec;35(11-12):615-621. doi: 10.1002/minf.201600073. Epub 2016 Jul 28. Mol Inform. 2016. PMID: 27464907 Free PMC article.
LinkOut - more resources
Full Text Sources
Other Literature Sources