Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 20;22(1):511.
doi: 10.1186/s12859-021-04438-4.

Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data

Affiliations

Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data

Takayuki Osabe et al. BMC Bioinformatics. .

Abstract

Background: RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report that a model-based clustering algorithm implemented in an R package, MBCluster.Seq, can also be used for DE analysis.

Results: The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG (PDEG) was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm.

Conclusions: MBCdeg with DEGES normalization can be used in the identification of DEGs when the PDEG is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required.

Keywords: Differential expression; Gene clustering; Posterior probability; RNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there is no competing interests.

Figures

Fig. 1
Fig. 1
Results for two-group simulated data (PDEG ≤ 0.25). Boxplots of AUC values (100 trials) for five methods for a total of eight conditions, P1 = 0.5 (left) to 1.0 (right) with PDEG = 0.05 (upper) and 0.25 (lower). The performance of MBCdeg (with K = 3) was high in most trials. The explanation for a representative trial (AUC = 0.9295) and the worst trial (AUC = 0.633) using MBCdeg2 with PDEG = 0.25 and P1 = 0.5 is given in Table 1
Fig. 2
Fig. 2
Effect of the different cluster numbers for MBCdeg (K = 2–4). Boxplots of AUC values (100 trials) for MBCdeg1 (K = 2–4; colored in black) and MBCdeg2 (K = 2–4; colored in red) are shown. The remaining legends are the same as in Fig. 1. The AUC values for MBCdeg with K = 3 were similar to those in Fig. 1 (different trials were used). Note that two trials result in AUC < 0.4 (not shown): MBCdeg1 with K = 2 with PDEG = 0.05 and P1 = 0.9 (AUC = 0.3498), and MBCdeg1 with K = 4 with PDEG = 0.25 and P1 = 1.0 (AUC = 0.3893)
Fig. 3
Fig. 3
Effect on larger PDEG values (PDEG ≥ 0.45). Boxplots of AUC values (50 trials) for five methods for a total of 16 conditions, P1 = 0.5 (left) to 1.0 (right) with PDEG = 0.45 (top) to 0.75 (bottom). MBCdeg1 and MBCdeg2 were used for analysis with K = 3. The performance of MBCdeg2 partly depended on the of DEGES normalization (see Tables 2 and 3)
Fig. 4
Fig. 4
Results for three-group simulated data. Boxplots of AUC values (50 trials) for eleven methods (DESeq2, edgeR, TCC, and MBCdeg with K = 2–5) are shown. The simulation conditions used were PDEG = 0.25, FC = 4, and n1 = n2 = n3 = 3

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. doi: 10.1101/gr.079558.108. - DOI - PMC - PubMed
    1. Kudo A, Shigenobu S, Kadota K, Nozawa M, Shibata TF, Ishikawa Y, et al. Comparative analysis of the brain transcriptome in a hyper-aggressive fruit fly Drosophila prolongata. Insect Biochem Mol Biol. 2017;82:11–20. doi: 10.1016/j.ibmb.2017.01.006. - DOI - PubMed
    1. Ohde T, Morita S, Shigenobu S, Morita J, Mizutani T, Gotoh H, et al. Rhinoceros beetle horn development reveals deep parallels with dung beetles. PLoS Genet. 2018;14:e1007651. doi: 10.1371/journal.pgen.1007651. - DOI - PMC - PubMed
    1. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. - DOI - PMC - PubMed