Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 14;14(10):e0223596.
doi: 10.1371/journal.pone.0223596. eCollection 2019.

CavBench: A benchmark for protein cavity detection methods

Affiliations

CavBench: A benchmark for protein cavity detection methods

Sérgio Dias et al. PLoS One. .

Abstract

Extensive research has been applied to discover new techniques and methods to model protein-ligand interactions. In particular, considerable efforts focused on identifying candidate binding sites, which quite often are active sites that correspond to protein pockets or cavities. Thus, these cavities play an important role in molecular docking. However, there is no established benchmark to assess the accuracy of new cavity detection methods. In practice, each new technique is evaluated using a small set of proteins with known binding sites as ground-truth. However, studies supported by large datasets of known cavities and/or binding sites and statistical classification (i.e., false positives, false negatives, true positives, and true negatives) would yield much stronger and reliable assessments. To this end, we propose CavBench, a generic and extensible benchmark to compare different cavity detection methods relative to diverse ground truth datasets (e.g., PDBsum) using statistical classification methods.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Main types of cavities: A—Pockets, B—Cavities, C—Tunnels, D—Pores (courtesy of Sehnal et al. [49]).
Fig 2
Fig 2. A snippet of XML file 1a4u.xml partially describing two clefts of the protein 1A4U in the CavDataset.
Fig 3
Fig 3. Snippets of the Perl file 1gfs.pl (concerning the protein 1GFS) that describes the volume of the second cluster (or cleft) and three of its dummy atoms.
Fig 4
Fig 4. Workflow of the Fpocket-specific parser for the protein 1A4U, that transforms the multiple-file PDB-like output of Fpocket into a single .xml file called 1a4u-fpocket.xml.
Fig 5
Fig 5. CavBench’s workflow to compare Fpocket-specific cavities with ground-truth cavities.
Fig 6
Fig 6. The overlapping matrix produced by the cavity overlapping evaluator for the protein 180L, as a result of benchmarking the cavities outputted by the GaussianFinder against the ground-truth dataset.
Columns correspond to ground-truth cavities and rows to cavities detected by the GaussianFinder method. True positives (TP) correspond to rows (GaussianFinder cavities) containing at least a green cell; false positives (FP) are identified by rows (GaussianFinder cavities) without any green cell, i.e., they do not meet any ground-truth cavity; false negatives (FN) are identified by columns (i.e., ground-truth cavities) in red. This example contains 11 TP (rows), 10 FP (rows) and 2 FN (columns).
Fig 7
Fig 7. Overlapping matrix (top) and voxelized, cross-eyed stereoscopic 3D visualization of the protein 1A4U.
The example represents the overlap between 5 method-detected cavities (rows) and 10 ground-truth cavities (columns). The overlapped portions of the ground-truth cavities are rendered with opaque colors (true positives), and the non-overlapped portions as semi-transparent (false negatives). This example shows that the method detected roughly 13.6% of the first cavity (red), 1.5% of the second (orange), 2.3% of the fourth (light green), 1.4% of the sixth (cyan), and 11.8% of the eigth (purple).

Similar articles

Cited by

References

    1. Laskowski R, Hutchinson E, Michie A, Wallace A, Jones M, Thornton J. PDBsum: a Web-based database of summaries and analyses of all PDB structures. Trends in Biochemical Sciences. 1997;22(12):488–490. 10.1016/s0968-0004(97)01140-7 - DOI - PubMed
    1. de Beer T, Berka K, Thornton J, Laskowski R. PDBsum additions. Nucleic Acids Research. 2014;42(D):292–296. 10.1093/nar/gkt940 - DOI - PMC - PubMed
    1. Kellenberger E, Muller P, Schalon C, Bret G, Foata N, Rognan D. sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank. Journal of Chemical Information and Modeling. 2006;46(2):717–727. 10.1021/ci050372x - DOI - PubMed
    1. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE. A geometric approach to macromolecule-ligand interactions. Journal of Molecular Biology. 1982;161(2):269–288. 10.1016/0022-2836(82)90153-x - DOI - PubMed
    1. Shoichet B, Kuntz I, Bodian D. Molecular docking using shape descriptors. Journal of Computational Chemistry. 1992;13(3):380–397. 10.1002/jcc.540130311 - DOI

Publication types