Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;15(4):534-41.
doi: 10.1093/bib/bbt029.

A bi-Poisson model for clustering gene expression profiles by RNA-seq

A bi-Poisson model for clustering gene expression profiles by RNA-seq

Ningtao Wang et al. Brief Bioinform. 2014 Jul.

Abstract

With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation–maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene–environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. We performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating our understanding of gene functions and networks.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Plot of BIC values over the number of clusters calculated from a transcripteomic study involving two breast cancer cell lines that are sensitive and resistant to tamoxifen, respectively.
Figure 2:
Figure 2:
Mean values of gene expression for 10 distinct groups in sensitive and resistant cell lines of breast cancer. (A) Absolute values of gene expression in the two cell lines. (B) Differences of gene expression from sensitive to resistant cell lines.
Figure 3:
Figure 3:
Comparison of estimated gene expression (solid) with true values (broken) for 10 distinct groups from simulated data by mimicking the transcripteomic study of breast cancer. (A) Absolute values of gene expression in the two cell lines. (B) Differences of gene expression from sensitive to resistant cell lines.

References

    1. Metzker ML. Sequencing technologies–the next generation. Nat Rev Genet. 2010;11:31–46. - PubMed
    1. Wang L, Li P, Brutnell TP. Exploring plant transcriptomes using ultra high-throughput sequencing. Brief Funct Genomics. 2010;9:118–28. - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:53–67. - PMC - PubMed
    1. Marguerat S, Wilhelm BT, Bahler J. Next-generation sequencing: applications beyond genomes. Biochem Soc Trans. 2008;36:1091–6. - PMC - PubMed
    1. Sultan M, Schulz MH, Richard H. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–60. - PubMed

Publication types