ChemBank: a small-molecule screening and cheminformatics resource database

Kathleen Petri Seiler¹, Gregory A George, Mary Pat Happ, Nicole E Bodycombe, Hyman A Carrinski, Stephanie Norton, Steve Brudz, John P Sullivan, Jeremy Muhlich, Martin Serrano, Paul Ferraiolo, Nicola J Tolliday, Stuart L Schreiber, Paul A Clemons

Affiliations

PMID: 17947324
PMCID: PMC2238881
DOI: 10.1093/nar/gkm843

ChemBank: a small-molecule screening and cheminformatics resource database

Kathleen Petri Seiler et al. Nucleic Acids Res. 2008 Jan.

. 2008 Jan;36(Database issue):D351-9.

doi: 10.1093/nar/gkm843. Epub 2007 Oct 18.

Authors

Affiliation

¹ Chemical Biology Program and Platform, Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA.

PMID: 17947324
PMCID: PMC2238881
DOI: 10.1093/nar/gkm843

Abstract

ChemBank (http://chembank.broad.harvard.edu/) is a public, web-based informatics environment developed through a collaboration between the Chemical Biology Program and Platform at the Broad Institute of Harvard and MIT. This knowledge environment includes freely available data derived from small molecules and small-molecule screens and resources for studying these data. ChemBank is unique among small-molecule databases in its dedication to the storage of raw screening data, its rigorous definition of screening experiments in terms of statistical hypothesis testing, and its metadata-based organization of screening experiments into projects involving collections of related assays. ChemBank stores an increasingly varied set of measurements derived from cells and other biological assay systems treated with small molecules. Analysis tools are available and are continuously being developed that allow the relationships between small molecules, cell measurements, and cell states to be studied. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Institute by collaborators from the worldwide research community. The goal of ChemBank is to provide life scientists unfettered access to biomedically relevant data and tools heretofore available primarily in the private sector.

PubMed Disclaimer

Figures

**Figure 1.**
*Conceptual* summary of ChemBank *schema*. Logical illustration of *ChemBank* data model, in which 95 tables are organized into groups representing components of the chemical biology research enterprise. Each box represents several actual database tables, as indicated, and pseudocardinality relationships between boxes are meant to convey conceptual relationships, rather than the more complex cardinality relationships that relate the actual tables.

**Figure 2.**
ChemBank *offers multiple routes to find chemical information*. Search tools allowing structure drawing (28) for substructure or similarity searches (a), selection of calculated molecular descriptors (b) and selection of term-based bioactivity annotations (c), each provide avenues to find individual molecules or sets of molecules in *ChemBank*. The *ChemBank* ‘Molecule Display’ webpage (*background*) provides detailed information about each molecule, including structure, names, molecular descriptors, biological annotations, sample information and screening instances.

**Figure 3.**
*Relationship of* ChemBank ‘*View Project*’ *and* ‘*View Assay*’ *webpages*. Screenshots of representative screening project and assay (*inset*) webpages. Emphasis (*red boxes, arrow*) has been added to highlight key information, including project description and motivation (a), individual assays within project (b), detailed description (shared by both webpages) of assay protocol (c) and individual screening plates within assay (d).

**Figure 4.**
ChemBank *standard data-analysis model for high-throughput small-molecule screens*. All raw small-molecule assay results in *ChemBank* are further processed by comparing each measurement with the collection of mock-treatment well measurements performed in the same screening experiment. Median values from mock-treatment wells on the same plate are used in an initial zero-centering step (a), after which the distribution of mock-treatment measurements for the entire experiment is trimmed to eliminate systematic artifacts (b). Trimmed mock-treatment measurements are used to normalize assay performance by first subtracting the mean of trimmed mock-treatment measurements on the same plate to give ‘background-subtracted values’ (c), then dividing by twice the standard deviation of trimmed mock-treatment measurements for the entire experiment to give ‘dimensionless Z-score values’ (d). Replicate handling is performed by cosine correlation of the replicate pair (for screens done in duplicate) of ‘dimensionless Z-score values’ for each compound with a simple prior model of ‘perfect reproducibility’, to yield a ‘Composite Z-score value’ (e) that represents the final primary screening result. The *ChemBank* web interface provides access to raw and processed data types appropriate for each of its visualization tools (f).

**Figure 5.**
*Illustration of* ChemBank *visualizations and linking activities with chemical information*. Screening data, including raw measurements, in *ChemBank* are addressable by exact plate and well position in assay plates (a), and statistical data representing outcomes (b) can be reviewed at the level of raw or normalized data. A multi-assay analysis capability takes advantage of the standard analysis procedure (Figure 4) to display the performance of such similar compounds in multiple assays to which each has been exposed (c). Each of these capabilities can be combined with structure and annotation-based search capabilities to provide cheminformatic analysis of molecules scoring as ‘hits’ in biological assays (d).

See this image and copyright information in PMC

References

1. Strausberg RL, Schreiber SL. From knowing to controlling: a path from genomics to drugs using small molecule probes. Science. 2003;300:294–295. - PubMed
1. Tolliday N, Clemons PA, Ferraiolo P, Koehler AN, Lewis TA, Li X, Schreiber SL, Gerhard DS, Eliasof S. Small molecules, big players: the National Cancer Institute's Initiative for Chemical Genetics. Cancer Res. 2006;66:8935–8942. - PubMed
1. Brooksbank C, Cameron G, Thornton J. The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Res. 2005;33:D46–D53. - PMC - PubMed
1. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34:D668–D672. - PMC - PubMed
1. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007;35:D5–D12. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

P20 HG003895/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChemBank: a small-molecule screening and cheminformatics resource database

Affiliation

ChemBank: a small-molecule screening and cheminformatics resource database

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources