Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan;93(1):73-82.
doi: 10.1016/j.cmpb.2008.07.005. Epub 2008 Sep 3.

Design of a grid service-based platform for in silico protein-ligand screenings

Affiliations

Design of a grid service-based platform for in silico protein-ligand screenings

Marshall J Levesque et al. Comput Methods Programs Biomed. 2009 Jan.

Abstract

Grid computing offers the powerful alternative of sharing resources on a worldwide scale, across different institutions to run computationally intensive, scientific applications without the need for a centralized supercomputer. Much effort has been put into development of software that deploys legacy applications on a grid-based infrastructure and efficiently uses available resources. One field that can benefit greatly from the use of grid resources is that of drug discovery since molecular docking simulations are an integral part of the discovery process. In this paper, we present a scalable, reusable platform to choreograph large virtual screening experiments over a computational grid using the molecular docking simulation software DOCK. Software components are applied on multiple levels to create automated workflows consisting of input data delivery, job scheduling, status query, and collection of output to be displayed in a manageable fashion for further analysis. This was achieved using Opal OP to wrap the DOCK application as a grid service and PERL for data manipulation purposes, alleviating the requirement for extensive knowledge of grid infrastructure. With the platform in place, a screening of the ZINC 2,066,906 compound "drug-like" subset database against an enzyme's catalytic site was successfully performed using the MPI version of DOCK 5.4 on the PRAGMA grid testbed. The screening required 11.56 days laboratory time and utilized 200 processors over 7 clusters.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: The authors have no conflicts of interest to disclose.

Figures

Figure 1
Figure 1. The Platform Workflow
Starting with a large database of compounds, such as the ZINC library, a rapid docking method is used for the initial screening and is distributed via the dock service. Results data, consisting of compound lists ranked from best to worst energy score, are scattered across the grid resources. The ranking service searches for and gathers this data to construct a list of results encompassing the entire database of compounds. From this list, a top percentage of compounds can be selected to make up a new, smaller database of around 25,000 compounds. Conformational data output by the screening for each compound is retrieved by the database service and is used to build this new database. A number of different, more stringent parameters are then used to rescreen these compounds, distributed again by the dock service and results gathered by the ranking service. After performing a consensus amongst all scoring methods, a short list of the best potential binding compounds is generated to test in the lab in vitro.
Figure 2
Figure 2. Platform Scripts
Pseudocode outline of the local and remote scripts that make up the virtual screening platform. The automated platform consists of three stages that perform the distribution of DOCK jobs, the ranking screening results, and the building new databases from the results. Each stage in the local script makes requests to the grid services installed and hosted on the master node of each remote cluster for job submission and status querying.
Figure 3
Figure 3. Distributed Screening through the Dock Service
Job distribution is handled by the local script on the client machine that delivers required input through Opal OP toolkit and makes calls to the dock service hosted on each remote cluster. Resources can be added or removed by editing the resource list on the client machine. The database can be stored and accessed on the client machine or any other location that can be reached via HTTP for file transfer. Requests to launch DOCK MPI on each remote cluster are sent to the dock service with the Opalop-jobrun command. After preparing received input files according to the given arguments, the dock service submits the DOCK MPI job to the scheduling software to be executed. The status of each job is checked using the Opalop-jobquery command. Each remote cluster then has its own screening results that must be collected and sorted once the screening reaches completion.
Figure 4
Figure 4. Retrieving Ranked Results
The local script on the client machine calls the ranking service running on each remote cluster. The ranking service searches for screening results and puts together a list of compounds and their calculated energy scores. When every remote cluster finishes its search, the compound lists are retrieved via HTTP by the local script. Compounds from all the lists are sorted according to energy score to produce a ranked list of the best binding compounds. Additional remote clusters are not included in the diagram for clarity.
Figure 5
Figure 5. Building a Filtered Database
The local script on the client machine first queries the user for a desired percentage of the best binding compounds from the screening results. This generates lists of compounds whose conformational data needs to be retrieved to build a new database. These lists are sent to the database service on each remote cluster and the conformational data is pulled from the previous screenings results and used to build the “slices” of the new database. All the “slices” are retrieved by the local script via HTTP to create a filtered database consisting of a user-defined top percentage of compounds from the original database. This smaller database can then be used in more stringent docking and/or scoring methods that require more time per compound.

Similar articles

Cited by

References

    1. Foster I, Kesselman C, Tuecke S. The anatomy of the grid: Enabling scalable virtual organization. J-IJHPCA. 2001;15(3):200–222.
    1. W3C Web Services Architechture. 2004. http://www.w3.org/TR/ws-arch/.
    1. Foster I, Kesselman C, Nick JM, et al. In: The physiology of the grid, in Grid Computing. Berman Fran, Fox Geoffrey, Hey Tony., editors. John Wiley & Sons, Ltd; West Sussex, England: 2003. pp. 217–249.
    1. Krishnan S, Stearn B, Bhatia K, et al. Opal: Simple web services wrappers for scientific applications. ICWS. 2006:823–832.
    1. Stevens R. Trends in cyberinfrastructure for bioinformatics and computational biology. CTWatch Quarterly. 2006. pp. 1–5. Available: http://www.ctwatch.org/quarterly.

Publication types