Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 3;6(1):166.
doi: 10.1038/s41597-019-0174-7.

Creating reproducible pharmacogenomic analysis pipelines

Affiliations

Creating reproducible pharmacogenomic analysis pipelines

Anthony Mammoliti et al. Sci Data. .

Abstract

The field of pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Convergence of drugs and cell lines between GRAY (2017) and UHN Breast (2019) after curation through our CWL pipelines.
Fig. 2
Fig. 2
Breast cancer PharmacoSet (PSet) generation and DOI assignment through execution of a reproducible PharmacoGx CWL workflow.
Fig. 3
Fig. 3
ERBB2 expression as a biomarker for lapatinib in GRAY 2017 and UHN Breast 2019. N: number of samples; C-index: concordance index calculated for respective PSet; P-value; p-value calculated for respective PSet. Meta analysis represents combined concordance index and p-value across PSets.

Similar articles

Cited by

References

    1. D’Argenio Valeria. The High-Throughput Analyses Era: Are We Ready for the Data Struggle? High-Throughput. 2018;7(1):8. doi: 10.3390/ht7010008. - DOI - PMC - PubMed
    1. Dinov Ivo D. Volume and value of big healthcare data. Journal of Medical Statistics and Informatics. 2016;4(1):3. doi: 10.7243/2053-7662-4-3. - DOI - PMC - PubMed
    1. Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 2017;70:263–286. doi: 10.1016/j.jbusres.2016.08.001. - DOI
    1. Oussous A, Benjelloun F-Z, Ait Lahcen A, Belfkih S. Big Data technologies: A survey. Journal of King Saud University - Computer and Information Sciences. 2018;30:431–448. doi: 10.1016/j.jksuci.2017.06.001. - DOI
    1. Xu Z, Shi Y. Exploring Big Data Analysis: Fundamental Scientific Problems. Annals of Data Science. 2015;2:363–372. doi: 10.1007/s40745-015-0063-7. - DOI