Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 26;24(4):102361.
doi: 10.1016/j.isci.2021.102361. eCollection 2021 Apr 23.

NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data

Affiliations

NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data

Eliah G Overbey et al. iScience. .

Abstract

With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility, and reusability of pipeline data; to provide a template for data processing of future spaceflight-relevant datasets; and to encourage cross-analysis of data from other databases with the data available in GeneLab.

Keywords: Omics; Space Sciences.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
GeneLab RNA-seq Consensus Pipeline (RCP) (A) The three broad steps of the RCP. The RCP handles (1) data preprocessing to trim sequencing adapters and to provide quality control metrics; (2) data processing to map reads to the reference genome and quantify the number of read counts per gene; and (3) differential gene expression calculation, which will provide a list of differentially expressed genes that can be sorted by adjusted p value and log fold-change. (B) The full RCP annotated with tools, input files, and output files.
Figure 2
Figure 2
Data preprocessing (pipeline step 1): quality control and trimming (A) Data preprocessing pipeline. FastQ files from Illumina base-calling software are quality checked using FastQC and MultiQC. Data are then trimmed using TrimGalore and are re-checked for quality; (B) flags used for FastQC program; (C) flags used for MultiQC program; (D) flags used for TrimGalore program; trimmed reads (∗fastq.gz) are then used as input data for FastQC (B) followed by MultiQC (C) to generate trimmed read quality metrics. Tool versions used to process each dataset are included in the RNA-seq processing protocol in the GLDS Repository.
Figure 3
Figure 3
Data processing (pipeline step 2A): read mapping (A) Data processing pipeline. Trimmed reads are mapped to their reference genome and transcriptome with STAR. Gene counts are then quantified with RSEM; (B) flags used for generating the indexed STAR reference files; (C) flags used for mapping reads with STAR. Tool versions used to process each dataset are included in the RNA-seq processing protocol in the GLDS Repository.
Figure 4
Figure 4
Data processing (pipeline step 2B): gene quantification (A) Data processing pipeline. Mapping results from STAR are quantified by RSEM; (B) parameters for RSEM indexed reference files generation; (C) parameters for quantifying gene and isoform counts with RSEM. Tool versions used to process each dataset are included in the RNA-seq processing protocol in the GLDS repository.
Figure 5
Figure 5
Differential gene expression calculation (pipeline step 3) (A) Data processing pipeline. The R program DESeq2 is run in order to determine which genes are differentially expressed between experimental conditions using gene count files from RSEM. (B) Output files generated. The table columns distinguish which script produces each output. The columns distinguish how those output files are used.
Figure 6
Figure 6
Global and differential gene expression in spaceflight versus ground control liver samples from GeneLab datasets (A and B) Principal component analysis of global gene expression in spaceflight (FLT) and respective ground control (GC) liver samples from the (A) Rodent Research 1 (RR-1) NASA Validation mission (GLDS-168) and (B) RR-6 ISS-terminal mission (GLDS-245). Plots were generated using data in the normalized counts tables for each respective dataset on the NASA GeneLab Data Repository. (C and D) Heatmaps showing the top 30 differentially expressed genes in spaceflight (FLT) versus ground control (GC) liver samples from the (C) Rodent Research 1 (RR-1) NASA Validation mission (GLDS-168) and (D) RR-6 ISS-terminal mission (GLDS-245). Heatmaps were generated using data in the differential expression tables for each respective dataset on the NASA GeneLab Data Repository and are colored by relative expression. Adj. p value < 0.05 and |log2FC| > 1. All samples included were derived from frozen carcasses post-mission and utilized the ribo-depletion library preparation method.

References

    1. Andrews S. Babraham Institute; 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data. Babraham Bioinformatics.
    1. Baruzzo G., Hayer K.E., Kim E.J., Di Camillo B., FitzGerald G.A., Grant G.R. Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat. Methods. 2017;14:135–139. - PMC - PubMed
    1. Berrios D.C., Galazka J., Grigorev K., Gebre S., Costes S.V. NASA GeneLab: interfaces for the exploration of space omics data. Nucleic Acids Res. 2020 doi: 10.1093/nar/gkaa887. - DOI - PMC - PubMed
    1. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:525–527. - PubMed
    1. Castro-Wallace S.L., Chiu C.Y., John K.K., Stahl S.E., Rubins K.H., McIntyre A.B.R., Dworkin J.P., Lupisella M.L., Smith D.J., Botkin D.J. Nanopore DNA sequencing and genome assembly on the International space station. Sci. Rep. 2017;7:18022. - PMC - PubMed