Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 26;62(24):6309-6315.
doi: 10.1021/acs.jcim.2c01199. Epub 2022 Nov 28.

Q-raKtion: A Semiautomated KNIME Workflow for Bioactivity Data Points Curation

Affiliations

Q-raKtion: A Semiautomated KNIME Workflow for Bioactivity Data Points Curation

Deborah Palazzotti et al. J Chem Inf Model. .

Abstract

The recent increase of bioactivity data freely available to the scientific community and stored as activity data points in chemogenomic repositories provides a huge amount of ready-to-use information to support the development of predictive models. However, the benefits provided by the availability of such a vast amount of accessible information are strongly counteracted by the lack of uniformity and consistency of data from multiple sources, requiring a process of integration and harmonization. While different automated pipelines for processing and assessing chemical data have emerged in the last years, the curation of bioactivity data points is a less investigated topic, with useful concepts provided but no tangible tools available. In this context, the present work represents a first step toward the filling of this gap, by providing a tool to meet the needs of end-user in building proprietary high-quality data sets for further studies. Specifically, we herein describe Q-raKtion, a systematic, semiautomated, flexible, and, above all, customizable KNIME workflow that effectively aggregates information on biological activities of compounds retrieved by two of the most comprehensive and widely used repositories, PubChem and ChEMBL.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
General scheme of the Q-raKtion workflow.
Figure 2
Figure 2
(A) Detailed overview of the bioassay ontology curation (yellow box) and activity data curation (green box) steps. (B) Schematic workflow of the procedure applied for the activity data curation step. The data points are filtered to retain all of the activities corresponding to the same compound (starting table). The data points with the same activity type are processed by the developed “Confidence Class Assigner” metanode that returns a row with the best activity datum coupled with the corresponding confidence class. All of the processed information is finally converted in a unique row. pValue is calculated as the negative log10 of the molar activity value; ΔpValue is calculated as the difference between the maximum and the minimum values of pValue. Examples of activity types include IC50, Ki, and %inh (percentage of inhibition).
Figure 3
Figure 3
Number of data points collected in each confidence class for the high-quality activity types (i.e., XC50 and KX) both in the curated ChEMB and PubChem databases and in the final data set of AKT1 compounds.

References

    1. Kim S.; Chen J.; Cheng T.; Gindulyte A.; He J.; He S.; Li Q.; Shoemaker B. A.; Thiessen P. A.; Yu B.; et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021, 49, D1388–D1395. 10.1093/nar/gkaa971. - DOI - PMC - PubMed
    1. Gaulton A.; Bellis L. J.; Bento A. P.; Chambers J.; Davies M.; Hersey A.; Light Y.; McGlinchey S.; Michalovich D.; Al-Lazikani B.; et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–1107. 10.1093/nar/gkr777. - DOI - PMC - PubMed
    1. Gonzalez-Medina M.; Naveja J. J.; Sanchez-Cruz N.; Medina-Franco J. L. Open chemoinformatic resources to explore the structure, properties and chemical space of molecules. Rsc Advances 2017, 7, 54153–54163. 10.1039/C7RA11831G. - DOI
    1. Fourches D.; Muratov E.; Tropsha A. Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation. J. Chem. Inf Model 2016, 56 (7), 1243–1252. 10.1021/acs.jcim.6b00129. - DOI - PMC - PubMed
    1. Kalliokoski T.; Kramer C.; Vulpetti A. Quality Issues with Public Domain Chemogenomics Data. Mol. Inform 2013, 32, 898–905. 10.1002/minf.201300051. - DOI - PubMed

Publication types