A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists
- PMID: 36035514
- PMCID: PMC9408027
- DOI: 10.1093/synbio/ysac012
A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists
Abstract
Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.
Keywords: automation; machine learning; standardization; transcriptomics.
© The Author(s) 2022. Published by Oxford University Press.
Figures





Similar articles
-
Robustness and reproducibility of simple and complex synthetic logic circuit designs using a DBTL loop.Synth Biol (Oxf). 2023 Mar 28;8(1):ysad005. doi: 10.1093/synbio/ysad005. eCollection 2023. Synth Biol (Oxf). 2023. PMID: 37073283 Free PMC article.
-
Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis.ACS Synth Biol. 2022 Feb 18;11(2):608-622. doi: 10.1021/acssynbio.1c00305. Epub 2022 Jan 31. ACS Synth Biol. 2022. PMID: 35099189
-
BioWes-from design of experiment, through protocol to repository, control, standardization and back-tracking.Biomed Eng Online. 2016 Jul 15;15 Suppl 1(Suppl 1):74. doi: 10.1186/s12938-016-0188-8. Biomed Eng Online. 2016. PMID: 27454467 Free PMC article.
-
Genomics pipelines and data integration: challenges and opportunities in the research setting.Expert Rev Mol Diagn. 2017 Mar;17(3):225-237. doi: 10.1080/14737159.2017.1282822. Epub 2017 Jan 25. Expert Rev Mol Diagn. 2017. PMID: 28092471 Free PMC article. Review.
-
Role of Digital Microfluidics in Enabling Access to Laboratory Automation and Making Biology Programmable.SLAS Technol. 2020 Oct;25(5):411-426. doi: 10.1177/2472630320931794. Epub 2020 Jun 25. SLAS Technol. 2020. PMID: 32584152 Review.
Cited by
-
Special issue: reproducibility in synthetic biology.Synth Biol (Oxf). 2023 Nov 16;8(1):ysad015. doi: 10.1093/synbio/ysad015. eCollection 2023. Synth Biol (Oxf). 2023. PMID: 38022745 Free PMC article. No abstract available.
-
Automated in vivo enzyme engineering accelerates biocatalyst optimization.Nat Commun. 2024 Apr 24;15(1):3447. doi: 10.1038/s41467-024-46574-4. Nat Commun. 2024. PMID: 38658554 Free PMC article. Review.
-
Advancing reproducibility can ease the 'hard truths' of synthetic biology.Synth Biol (Oxf). 2023 Oct 28;8(1):ysad014. doi: 10.1093/synbio/ysad014. eCollection 2023. Synth Biol (Oxf). 2023. PMID: 38022744 Free PMC article. Review.
References
-
- Babraham Bioinformatics – FastQC: A quality control tool for high throughput sequence data . https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (8 August 2004, date last accessed).
-
- Picard Toolkit . (2019) Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/.
-
- Brookes E. and Stubbs J. (2019) GenApp, containers and Abaco: technical paper. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning). Association for Computing Machinery, New York, NY, USA, pp. 1–8.
LinkOut - more resources
Full Text Sources