Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar;22(3):238-249.
doi: 10.1177/1087057116679993. Epub 2016 Dec 13.

Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

Affiliations

Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

Ioannis K Moutsatsos et al. SLAS Discov. 2017 Mar.

Abstract

High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.

Keywords: CellProfiler; continuous integration; high-content screening; high-performance computing.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All the authors are employees of the Novartis Institutes of Biomedical Research and conducted this research as part of their drug discovery efforts.

Figures

Figure 1.
Figure 1.
Architecture of Jenkins-CI configured as a scientific data-processing platform. A typical Jenkins-CI installation (shown in the center) integrates computational resources (blue rectangles) and local and remote data (green file folders) and makes them accessible to end users via a standard web portal. A Jenkins-CI project configuration template defines the parameters, environment, and actions executed by a project build and drives the generation of the user interface. Installed Jenkins-CI plugins and local scripts and applications execute on the Jenkins-CI server and provide an extensible set of data management and processing functions. High-performance parallel computing tasks (such as image processing) can be easily integrated into Jenkins-CI projects using standard SSH access provided by the SSH plugin. The projects build history stores build metadata, transient analysis data, and reusable components such as pipelines and image lists. The instruments data shares store large data/image sets, shared between multiple OS systems (Windows/Linux). Instrument data shares act as the final secure repository for important analysis data.
Figure 2.
Figure 2.
Jenkins-CI web portal for access to high-performance compute (HPC) computational tasks and workflows. A default installation of Jenkins-CI provides customizable tabbed views that group the available Jenkins-CI projects. Displayed tabs include (1) the “Help” tab with helpful shortcuts and guides in the use of the various Jenkins-CI projects; (2) the “Image Lists” tab for generating and managing CellProfiler-formatted image lists for various high-content screening instruments; (3) the “CellProfiler Pipelines” tab for the management and sharing of CellProfiler image-processing pipelines; (4) the “CellProfiler Windows” and (5) “CellProfiler Linux Cluster” tabs for launching CellProfiler on the Windows and HPC platforms, respectively; and (6) the “CellProfiler Helpers” tab containing a variety of custom utilities for formatting and processing measurement files generated from CellProfiler.
Figure 3.
Figure 3.
Jenkins-CI workflow for parallel image processing using CellProfiler on the Linux cluster. (1) The user uploads (contributes) a working CellProfiler image-processing pipeline. (2) The user generates a CellProfiler-formatted image list from one or more primary image acquisition folders. Image lists contain metadata required for data grouping operations as well as for downstream import into our corporate results database. Contributed pipelines and image lists are annotated and stored in the Jenkins-CI project build history, from where they can be re/used for image analysis by multiple users. (3) The user completes the cluster image-processing submission form by selecting the image-processing pipeline and an appropriate image list. The form contains additional annotation fields and options for restricting the processing to a subset of the images. (4) The user starts the build, which launches a multistage Jenkins-CI workflow (shown as a gray rectangle). The workflow includes stages for launching CellProfiler in parallel mode, monitoring the progress of the parallel cluster run, merging well-level data (optional), and deleting temporary CellProfiler files and grid engine logs.
Figure 4.
Figure 4.
(A) Report from a parallel CellProfiler image-processing run. The default section of the build report is outlined in blue. An additional section has been appended by the “Associated Files” plugin. The default report displays a variety of build file artifacts and metadata. The associated files section displays the location of the intermediate and final results. In addition, the “Associated Files” plugin ensures that files from these locations are deleted when their associated build is deleted. (B) Custom summary report of a parallel CellProfiler image-processing run. To facilitate the retrieval of the resulting measurements, each CellProfiler image-processing run launched through Jenkins generates a custom HTML report. The report displays important analysis metadata and is hyperlinked to the pipeline and image list used in the run, as well as various data locations. The report is displayed with the aid of the “HTML Publisher” plugin that can display one or more html files in a tabbed format.
Figure 5.
Figure 5.
View of multistage pipelines used for parallel image processing. This multistage pipeline is used for submitting large data sets to the Linux cluster for CellProfiler processing. This custom view is generated with the aid of the Jenkins Build Pipeline Plugin. The number of pipeline runs displayed is configurable (in this case, we are displaying the last three). The Jenkins jobs participating in the pipeline are shown as blocks with arrows connecting one stage to the next in the sequence. Successfully executed blocks are green, and they include various statistics and shortcuts for more detailed inspection of the run logs. The first job (CellProfiler_JClustSelect) prepares a standard grid engine “job array” script by examining the submitted image list and creating a separate grid engine job for each group of 12 images. The next step is performed with the aid of the Jenkins SSH plugin (see Table 1 ). This plugin allows us to connect to the cluster and execute a short (bash) script. The script creates the data folders for writing the image analysis results, downloads the job array script from the Jenkins server, and finally launches the grid engine job array to process images using the CellProfiler command line mode. The second job (Monitor_JCPCluster) monitors the generation of output files from CellProfiler. As measurement files are generated, they are counted and the count compared with the expected number computed from the submitted image list. This allows us to construct a simple progress bar that is displayed in the Jenkins console. When all of the expected output is accounted for, the well-level data are merged into a single file to facilitate downstream data analysis. At this point, the third stage of the workflow is triggered. This third job (CU_CleanThumbnail_Folder) simply deletes any intermediate CellProfiler hdf5 databases, well-level data files, and grid engine job log files that were created during the run.

References

    1. Kümmel A., Selzer P., Siebert D., et al. Differentiation and Visualization of Diverse Cellular Phenotypic Responses in Primary High-Content Screening. J. Biomol. Screen. 2012, 17, 843–849. - PubMed
    1. Swinney D. C. Phenotypic vs. Target-Based Drug Discovery for First-in-Class Medicines. Clin. Pharmacol. Ther. 2013, 93, 299–301. - PubMed
    1. Feng Y., Mitchison T. J., Bender A., et al. Multi-Parameter Phenotypic Profiling: Using Cellular Effects to Characterize Small-Molecule Compounds. Nat. Rev. Drug Discov. 2009, 8, 567–578. - PubMed
    1. Westerink W. M., Schirris T. J., Horbach G. J., et al. Development and Validation of a High-Content Screening In Vitro Micronucleus Assay in CHO-k1 and HepG2 Cells. Mutat. Res. 2012, 724, 7–21. - PubMed
    1. Schmandke A., Schmandke A., Pietro M. A., et al. An Open Source Based High Content Screening Method for Cell Biology Laboratories Investigating Cell Spreading and Adhesion. PLoS One 2013, 21, e78212. - PMC - PubMed

LinkOut - more resources