Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 25;7(1):73.
doi: 10.1186/s13073-015-0202-y.

Use of semantic workflows to enhance transparency and reproducibility in clinical omics

Affiliations

Use of semantic workflows to enhance transparency and reproducibility in clinical omics

Christina L Zheng et al. Genome Med. .

Abstract

Background: Recent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. The use of workflow platforms such as Galaxy and Taverna have greatly enhanced the use, transparency and reproducibility of omics analysis pipelines in the research domain and would be an invaluable tool in a clinical setting. However, the use of these workflow platforms requires deep domain expertise that, particularly within the multi-disciplinary fields of translational and clinical omics, may not always be present in a clinical setting. This lack of domain expertise may put patient safety at risk and make these workflow platforms difficult to operationalize in a clinical setting. In contrast, semantic workflows are a different class of workflow platform where resultant workflow runs are transparent, reproducible, and semantically validated. Through semantic enforcement of all datasets, analyses and user-defined rules/constraints, users are guided through each workflow run, enhancing analytical validity and patient safety.

Methods: To evaluate the effectiveness of semantic workflows within translational and clinical omics, we have implemented a clinical omics pipeline for annotating DNA sequence variants identified through next generation sequencing using the Workflow Instance Generation and Specialization (WINGS) semantic workflow platform.

Results: We found that the implementation and execution of our clinical omics pipeline in a semantic workflow helped us to meet the requirements for enhanced transparency, reproducibility and analytical validity recommended for clinical omics. We further found that many features of the WINGS platform were particularly primed to help support the critical needs of clinical omics analyses.

Conclusions: This is the first implementation and execution of a clinical omics pipeline using semantic workflows. Evaluation of this implementation provides guidance for their use in both translational and clinical settings.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
WINGS datasets ontology for our clinical omics use-case. WINGS datasets — any input, output, or intermediate data files — within a workflow template are classified within an ontology. (a) The ontology classifying the datasets within our WINGS omics workflow is shown. Each dataset can be defined as an individual class or defined as a subclass of an existing dataset. Patient_Called_DNA_Variant_File is an example of an individually defined dataset class while COSMICSubset and Queried_COSMIC_Result are examples of subclasses under the COSMICData dataset. Each dataset can be further defined with metadata. (b) The defined metadata and its value for a Patient_Called_DNA_Variant_File are shown
Fig. 2
Fig. 2
WINGS workflow components ontology for our clinical omics use-case. WINGS components are used to encapsulate individual steps of an analysis pipeline and are classified within an ontology in a workflow template. Individual components can be classified as their own component-class or as a subclass of a component-type. Component-types are used to group components sharing a common base set of input and output datasets such as those encapsulating code for different versions of the same tool or different tools performing similar functions. Component-types can also be used to effectively organize and enhance the flexibility of individual components within a workflow template. Each step of our clinical omics analysis pipeline was encapsulated within a component-type, even if only one component is currently defined (a). Individual component-types are shown in grey while individual components are depicted in yellow. Each component is defined with the following: 1) input datasets, 2) computational code, and 3) output datasets. For example, each PredictProteinConsequence component was defined with the following two input datasets: 1) Patient_Called_DNA_Variant_File and 2) Transcript_File and the following output dataset: 1) Predicted_Protein_Consequence (b). The R code needed for the analysis of this step was included to complete the creation of the component
Fig. 3
Fig. 3
WINGS workflow template for our clinical omics use-case. WINGS templates are fully connected representations of all components, datasets, and rules and constraints of an analysis pipeline needed to execute a semantically validated workflow run. A workflow template representing our clinical omics analysis pipeline. Within our workflow template, each step is represented by its component-type (grey rectangles); however, please note that individual components can also be sequentially connected to one another to build a workflow template that has all input and output datasets (blue rounded rectangles) represented. Once a workflow template is created, WINGS generates an accompanied GUI for the workflow template, thus allowing workflow users to execute workflow runs. Due to the enforcement of all user-defined rules and constraints, each workflow run is semantically validated. Pre-defined rules and constraints also enables WINGS to help guide users through a workflow run by suggesting semantically validated inputs and parameters (Suggest Data and Suggest Parameters buttons). For example, due to our predefined rules and constraints, only datasets with the same genomic assembly would be suggested for this workflow template
Fig. 4
Fig. 4
Execution of our clinical omics use-case WINGS workflow. Once a workflow run is executed, the details of the run are shown. Displayed is the successful execution of our clinical omics use-case WINGS workflow. All input parameters (green), input and output data objects (blue), and individual components (yellow) of the workflow run are shown. Particularly when component-types are used to define a workflow template, the details of an executed workflow run can be used to identify the exact components used for each workflow run. Based on the chosen input datasets and the user-defined rules and constraints, Version1 of each component-type was used in our executed workflow run

Similar articles

Cited by

References

    1. Saracchi E, Fermi S, Brighina L. Emerging candidate biomarkers for Parkinson’s disease: a review. Aging Dis. 2013;5:27–34. doi: 10.14336/AD.2014.050027. - DOI - PMC - PubMed
    1. Thomas L, Di Stefano AL, Ducray F. Predictive biomarkers in adult gliomas: the present and the future. Curr Opin Oncol. 2013;25:689–94. doi: 10.1097/CCO.0000000000000002. - DOI - PubMed
    1. Kim Y, Kislinger T. Novel approaches for the identification of biomarkers of aggressive prostate cancer. Genome Med. 2013;5:56. doi: 10.1186/gm460. - DOI - PMC - PubMed
    1. Ellis MJ, Perou CM. The genomic landscape of breast cancer as a therapeutic roadmap. Cancer Discov. 2013;3:27–34. doi: 10.1158/2159-8290.CD-12-0462. - DOI - PMC - PubMed
    1. Church D, Kerr R, Domingo E, Rosmarin D, Palles C, Maskell K, et al. Toxgnostics’: an unmet need in cancer medicine. Nat Rev Cancer. 2014;14:440–5. doi: 10.1038/nrc3729. - DOI - PubMed

Publication types

LinkOut - more resources