Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;32(4):355-61.
doi: 10.1002/minf.201300006. Epub 2013 Apr 11.

Structural Key Bit Occurrence Frequencies and Dependencies in PubChem and Their Effect on Similarity Searches

Affiliations

Structural Key Bit Occurrence Frequencies and Dependencies in PubChem and Their Effect on Similarity Searches

Nelson G Chen et al. Mol Inform. 2013 Apr.

Abstract

Little published literature exists on the 881 bit structural keys used by PubChem for categorizing and comparing the compounds present in its database. We characterized these structural keys by examining their frequencies of occurrence within the PubChem compound database. In addition, bit dependencies, defined as the universal presence of a bit given the presence of another, were determined. We show that the vast majority of bits are rarely set and that substantial numbers of dependencies exist. A comparison of similarity searches with five United States Food and Drug Administration approved drugs as reference compounds using the full structural keys versus a variant in which all dependent bits were removed was performed using the Tanimoto coefficient. These bit dependencies not only affect similarity scores, but also alter the compounds returned in similarity searching. Judicious selection of bits is needed to maintain sufficient ability to differentiate related compounds.

Keywords: Cheminformatics; Molecular modeling; Molecular similarity.

PubMed Disclaimer

LinkOut - more resources