. 2009 Jan;93(1):73-82.

doi: 10.1016/j.cmpb.2008.07.005. Epub 2008 Sep 3.

Design of a grid service-based platform for in silico protein-ligand screenings

Marshall J Levesque¹, Kohei Ichikawa, Susumu Date, Jason H Haga

Affiliations

PMID: 18771812
PMCID: PMC2665129
DOI: 10.1016/j.cmpb.2008.07.005

Design of a grid service-based platform for in silico protein-ligand screenings

Marshall J Levesque et al. Comput Methods Programs Biomed. 2009 Jan.

. 2009 Jan;93(1):73-82.

doi: 10.1016/j.cmpb.2008.07.005. Epub 2008 Sep 3.

Authors

Marshall J Levesque¹, Kohei Ichikawa, Susumu Date, Jason H Haga

Affiliation

¹ Department of Bioengineering, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0435, USA.

PMID: 18771812
PMCID: PMC2665129
DOI: 10.1016/j.cmpb.2008.07.005

Abstract

Grid computing offers the powerful alternative of sharing resources on a worldwide scale, across different institutions to run computationally intensive, scientific applications without the need for a centralized supercomputer. Much effort has been put into development of software that deploys legacy applications on a grid-based infrastructure and efficiently uses available resources. One field that can benefit greatly from the use of grid resources is that of drug discovery since molecular docking simulations are an integral part of the discovery process. In this paper, we present a scalable, reusable platform to choreograph large virtual screening experiments over a computational grid using the molecular docking simulation software DOCK. Software components are applied on multiple levels to create automated workflows consisting of input data delivery, job scheduling, status query, and collection of output to be displayed in a manageable fashion for further analysis. This was achieved using Opal OP to wrap the DOCK application as a grid service and PERL for data manipulation purposes, alleviating the requirement for extensive knowledge of grid infrastructure. With the platform in place, a screening of the ZINC 2,066,906 compound "drug-like" subset database against an enzyme's catalytic site was successfully performed using the MPI version of DOCK 5.4 on the PRAGMA grid testbed. The screening required 11.56 days laboratory time and utilized 200 processors over 7 clusters.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: The authors have no conflicts of interest to disclose.

Figures

**Figure 1. The Platform Workflow**
Starting with a large database of compounds, such as the ZINC library, a rapid docking method is used for the initial screening and is distributed via the dock service. Results data, consisting of compound lists ranked from best to worst energy score, are scattered across the grid resources. The ranking service searches for and gathers this data to construct a list of results encompassing the entire database of compounds. From this list, a top percentage of compounds can be selected to make up a new, smaller database of around 25,000 compounds. Conformational data output by the screening for each compound is retrieved by the database service and is used to build this new database. A number of different, more stringent parameters are then used to rescreen these compounds, distributed again by the dock service and results gathered by the ranking service. After performing a consensus amongst all scoring methods, a short list of the best potential binding compounds is generated to test in the lab in vitro.

**Figure 2. Platform Scripts**
Pseudocode outline of the local and remote scripts that make up the virtual screening platform. The automated platform consists of three stages that perform the distribution of DOCK jobs, the ranking screening results, and the building new databases from the results. Each stage in the local script makes requests to the grid services installed and hosted on the master node of each remote cluster for job submission and status querying.

**Figure 3. Distributed Screening through the Dock Service**
Job distribution is handled by the local script on the client machine that delivers required input through Opal OP toolkit and makes calls to the dock service hosted on each remote cluster. Resources can be added or removed by editing the resource list on the client machine. The database can be stored and accessed on the client machine or any other location that can be reached via HTTP for file transfer. Requests to launch DOCK MPI on each remote cluster are sent to the dock service with the *Opalop-jobrun* command. After preparing received input files according to the given arguments, the dock service submits the DOCK MPI job to the scheduling software to be executed. The status of each job is checked using the *Opalop-jobquery* command. Each remote cluster then has its own screening results that must be collected and sorted once the screening reaches completion.

**Figure 4. Retrieving Ranked Results**
The local script on the client machine calls the ranking service running on each remote cluster. The ranking service searches for screening results and puts together a list of compounds and their calculated energy scores. When every remote cluster finishes its search, the compound lists are retrieved via HTTP by the local script. Compounds from all the lists are sorted according to energy score to produce a ranked list of the best binding compounds. Additional remote clusters are not included in the diagram for clarity.

**Figure 5. Building a Filtered Database**
The local script on the client machine first queries the user for a desired percentage of the best binding compounds from the screening results. This generates lists of compounds whose conformational data needs to be retrieved to build a new database. These lists are sent to the database service on each remote cluster and the conformational data is pulled from the previous screenings results and used to build the “slices” of the new database. All the “slices” are retrieved by the local script via HTTP to create a filtered database consisting of a user-defined top percentage of compounds from the original database. This smaller database can then be used in more stringent docking and/or scoring methods that require more time per compound.

See this image and copyright information in PMC

References

1. Foster I, Kesselman C, Tuecke S. The anatomy of the grid: Enabling scalable virtual organization. J-IJHPCA. 2001;15(3):200–222.
1. W3C Web Services Architechture. 2004. http://www.w3.org/TR/ws-arch/.
1. Foster I, Kesselman C, Nick JM, et al. In: The physiology of the grid, in Grid Computing. Berman Fran, Fox Geoffrey, Hey Tony., editors. John Wiley & Sons, Ltd; West Sussex, England: 2003. pp. 217–249.
1. Krishnan S, Stearn B, Bhatia K, et al. Opal: Simple web services wrappers for scientific applications. ICWS. 2006:823–832.
1. Stevens R. Trends in cyberinfrastructure for bioinformatics and computational biology. CTWatch Quarterly. 2006. pp. 1–5. Available: http://www.ctwatch.org/quarterly.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 HL085159/HL/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Design of a grid service-based platform for in silico protein-ligand screenings

Affiliation

Design of a grid service-based platform for in silico protein-ligand screenings

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous