Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 5;51(W1):W372-W378.
doi: 10.1093/nar/gkad429.

NORMSEQ: a tool for evaluation, selection and visualization of RNA-Seq normalization methods

Affiliations

NORMSEQ: a tool for evaluation, selection and visualization of RNA-Seq normalization methods

Chantal Scheepbouwer et al. Nucleic Acids Res. .

Abstract

RNA-sequencing has become one of the most used high-throughput approaches to gain knowledge about the expression of all different RNA subpopulations. However, technical artifacts, either introduced during library preparation and/or data analysis, can influence the detected RNA expression levels. A critical step, especially in large and low input datasets or studies, is data normalization, which aims at eliminating the variability in data that is not related to biology. Many normalization methods have been developed, each of them relying on different assumptions, making the selection of the appropriate normalization strategy key to preserve biological information. To address this, we developed NormSeq, a free web-server tool to systematically assess the performance of normalization methods in a given dataset. A key feature of NormSeq is the implementation of information gain to guide the selection of the best normalization method, which is crucial to eliminate or at least reduce non-biological variability. Altogether, NormSeq provides an easy-to-use platform to explore different aspects of gene expression data with a special focus on data normalization to help researchers, even without bioinformatics expertise, to obtain reliable biological inference from their data. NormSeq is freely available at: https://arn.ugr.es/normSeq.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
NormSeq's workflow and implementation. (A) Workflow of NormSeq. User-provided RNA-seq counts are used for data normalization. NormSeq provides eight different options for data normalization, four differential expression analysis protocols, and optional batch effect correction. Assessment based on the information gain distribution guides selection of the best normalization method that helps obtain the most reliable biological inference from the data. (B) Information gain distribution of seven out of eight of the normalization methods available in NormSeq applied to the miRNA sequencing dataset SRP326090 (32). The comparison of healthy individuals and cancer patients with active Hodgkin Lymphoma is shown, where 4 methods (CPM, TMM, QN and RLE) outperformed the others in terms of information gain. (C) Hierarchical clustering analysis of the miRNA seq data in healthy individuals and cancer patients with active disease. Data is normalized by upper quartile (left) and quantile (right), showing that quantile normalization clusters better represent the two biological conditions. (D) Upset plot showing the intersection of differentially expressed miRNAs detected with edgeR, DESeq2, NOISeq and a Student's t-test. (E) Boxplot visualization of the top 10 highest fold change miRNAs between healthy individuals and cancer patients.
Figure 2.
Figure 2.
NormSeq can guide users in selecting the most appropriate normalization method for every dataset. (A) (top panel) Notched boxplot of information gain results for NormSeq's normalization methods (no normalization (NN), counts per million (CPM), upper quartile (UQ), median (Med), trimmed mean of M values (TMM), quartile (QN) and relative log expression (RLE)) applied to count tables from QuantM-tRNA seq data in HEK293T cells. (bottom panel) Pearson correlation of CPM and Med normalized read counts QuantM-tRNA seq data versus tRNA array quantification. (B) (top panel) Notched boxplot of information gain results for NormSeq's normalization methods (no normalization, CPM, UQ, Med, TMM, QN and RLE) applied to count tables from Hydro-tRNAseq data in HEK293T cells. (bottom panel) Pearson correlation of CPM and median normalized Hydro-tRNAseq data versus tRNA array quantification. (C) RNA expression distribution for CPM (top panel) and Med (bottom panel) normalization. Data are represented as log10 values on the x-axis. (D) Notched boxplot of information gain results for NormSeq's normalization methods (no normalization, CPM, UQ, Med, TMM, QN, RUVs and RLE) applied to count tables from QuantM-tRNA seq data in CNS tissues. (E) Bar graph showing the information gain for brain-enriched tRNA genes tRNA-Ile-TAT-2–1;2–2;2–3 (left panel) and tRNA-Ala-AGC-3–1 (right panel). (F) Box plot showing the comparison of CPM, TMM, and QN normalization for tRNA-Ile-TAT-2–1;2–2;2–3 expression in CNS, tibia, heart and liver tissues from the QuantM-tRNA seq dataset. (G) Box plot showing the comparison of CPM, TMM and QN normalization for tRNA-Ala-AGC-3–1 expression in CNS, tibia, heart, and liver tissues from the QuantM-tRNA seq dataset.

References

    1. Eijndhoven M.A.J.v., Aparicio-Puerta E., Gómez-Martín C., Medina J.M., Drees E.E.E., Bradley E.J., Bosch L., Scheepbouwer C., Hackenberg M., Pegtel D.M. Unbiased and UMI-informed sequencing of cell-free miRNAs at single-nucleotide resolution. 2021; bioRxiv doi:04 May 2021, preprint: not peer reviewed10.1101/2021.05.04.442244. - DOI
    1. Kim H., Kim J., Kim K., Chang H., You K., Kim V.N.. Bias-minimized quantification of microRNA reveals widespread alternative processing and 3′ end modification. Nucleic Acids Res. 2019; 47:2630–2640. - PMC - PubMed
    1. Scheepbouwer C., Aparicio-Puerta E., Gomez-Martin C., Verschueren H., van Eijndhoven M., Wedekind L.E., Giannoukakos S., Hijmering N., Gasparotto L., van der Galien H.T.et al. .. ALL-tRNAseq enables robust tRNA profiling in tissue samples. Genes Dev. 2023; 37:243–257. - PMC - PubMed
    1. Stark R., Grzelak M., Hadfield J.. RNA sequencing: the teenage years. Nat. Rev. Genet. 2019; 20:631–656. - PubMed
    1. Risso D., Ngai J., Speed T.P., Dudoit S.. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014; 32:896–902. - PMC - PubMed

Publication types