Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;29(9):885-96.
doi: 10.1007/s10822-015-9860-5. Epub 2015 Jul 23.

Activity, assay and target data curation and quality in the ChEMBL database

Affiliations

Activity, assay and target data curation and quality in the ChEMBL database

George Papadatos et al. J Comput Aided Mol Des. 2015 Sep.

Abstract

The emergence of a number of publicly available bioactivity databases, such as ChEMBL, PubChem BioAssay and BindingDB, has raised awareness about the topics of data curation, quality and integrity. Here we provide an overview and discussion of the current and future approaches to activity, assay and target data curation of the ChEMBL database. This curation process involves several manual and automated steps and aims to: (1) maximise data accessibility and comparability; (2) improve data integrity and flag outliers, ambiguities and potential errors; and (3) add further curated annotations and mappings thus increasing the usefulness and accuracy of the ChEMBL data for all users and modellers in particular. Issues related to activity, assay and target data curation and integrity along with their potential impact for users of the data are discussed, alongside robust selection and filter strategies in order to avoid or minimise these, depending on the desired application.

Keywords: Data curation; Data quality; Public bioactivity databases.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The current in-house compound, activity, assay and target curation workflow in ChEMBL production. The steps involved in the activity, assay and target curation branches, along with suggestions on how the users/modellers can utilise these to improve data integrity and minimise or avoid ambiguity are discussed in the following sections
Fig. 2
Fig. 2
The experimental data section of the ChEMBL 20 database schema, showing the columns of the ACTIVITIES and ASSAYS tables
Fig. 3
Fig. 3
A subset of the target information section of the ChEMBL 20 database schema

References

    1. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):D1100–D1107. doi: 10.1093/nar/gkr777. - DOI - PMC - PubMed
    1. Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014;42(D1):D1083–D1090. doi: 10.1093/nar/gkt1031. - DOI - PMC - PubMed
    1. Papadatos G, Overington JP. The ChEMBL database: a taster for medicinal chemists. Future Med Chem. 2014;6(4):361–364. doi: 10.4155/fmc.14.8. - DOI - PubMed
    1. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, Shoemaker BA, Bolton E, Gindulyte A, Bryant SH. PubChem’s BioAssay database. Nucleic Acids Res. 2012;40(D1):D400–D412. doi: 10.1093/nar/gkr1132. - DOI - PMC - PubMed
    1. Liu T, Lin Y, Wen X, Jorrisen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007;35:D198–D201. doi: 10.1093/nar/gkl999. - DOI - PMC - PubMed

Publication types