Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul;27(7):583-603.
doi: 10.1007/s10822-013-9664-4. Epub 2013 Jul 25.

From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions

Affiliations

From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions

Villu Ruusmann et al. J Comput Aided Mol Des. 2013 Jul.

Abstract

The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.

PubMed Disclaimer

Similar articles

Cited by

References

    1. SAR QSAR Environ Res. 2008;19(7-8):751-83 - PubMed
    1. Chem Res Toxicol. 1998 Aug;11(8):902-8 - PubMed
    1. Environ Toxicol Pharmacol. 2007 Jan;23(1):10-7 - PubMed
    1. Chemosphere. 1996 Apr;32(8):1453-68 - PubMed
    1. Environ Toxicol Pharmacol. 1999 Mar;7(1):33-9 - PubMed

Publication types

LinkOut - more resources