Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Dec;32(11-12):898-905.
doi: 10.1002/minf.201300051. Epub 2013 Sep 8.

Quality Issues with Public Domain Chemogenomics Data

Affiliations
Review

Quality Issues with Public Domain Chemogenomics Data

Tuomo Kalliokoski et al. Mol Inform. 2013 Dec.

Abstract

The key concept in chemogenomics is the similarity principle that states that similar ligands should bind similar targets. Chemogenomic analysis requires large amounts of data and both powerful computational algorithms and computers. Data used for chemogenomics analysis can either be compiled from open sources, or they can be produced in-house as is often done in the pharmaceutical industry. The chemogenomic modeller often has to resort to mixing activity values from different laboratories and even assay types to facilitate chemogenomic analysis. The amount of chemogenomics data available in the public domain has dramatically increased in recent years, allowing fully traceable analysis on a continuously increasing scale. However, some warning flags about the data quality have been raised and because the primary data determine the accuracy of chemogenomic analysis, the quality of the data is one of the key questions in chemogenomics. This mini-review discusses some of the most common issues with public domain biological data related to chemogenomic analysis. The errors in data can originate from problems with the experiments themselves and their interpretation, or from more mundane issues such as data extraction and annotation. These issues are not unique for a certain database but are shared by all the public domain databases and can plague commercial and in-house bioactivity databases as well.

Keywords: Chemogenomics; Data accuracy; Databases; Experimental uncertainty.

PubMed Disclaimer

LinkOut - more resources