. 2013 Feb 5:14:39.

doi: 10.1186/1471-2105-14-39.

puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis

Xuejun Liu¹, Zhenzhu Gao, Li Zhang, Magnus Rattray

Affiliations

PMID: 23379655
PMCID: PMC3626802
DOI: 10.1186/1471-2105-14-39

puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis

Xuejun Liu et al. BMC Bioinformatics. 2013.

. 2013 Feb 5:14:39.

doi: 10.1186/1471-2105-14-39.

Authors

Xuejun Liu¹, Zhenzhu Gao, Li Zhang, Magnus Rattray

Affiliation

¹ College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, 29 Yudao St., Nanjing 210016, China. xuejun.liu@nuaa.edu.cn

PMID: 23379655
PMCID: PMC3626802
DOI: 10.1186/1471-2105-14-39

Abstract

Background: Microarrays have been a popular tool for gene expression profiling at genome-scale for over a decade due to the low cost, short turn-around time, excellent quantitative accuracy and ease of data generation. The Bioconductor package puma incorporates a suite of analysis methods for determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analysis. As isoform level expression profiling receives more and more interest within genomics in recent years, exon microarray technology offers an important tool to quantify expression level of the majority of exons and enables the possibility of measuring isoform level expression. However, puma does not include methods for the analysis of exon array data. Moreover, the current expression summarisation method for Affymetrix 3' GeneChip data suffers from instability for low expression genes. For the downstream analysis, the method for differential expression detection is computationally intensive and the original expression clustering method does not consider the variance across the replicated technical and biological measurements. It is therefore necessary to develop improved uncertainty propagation methods for gene and transcript expression analysis.

Results: We extend the previously developed Bioconductor package puma with a new method especially designed for GeneChip Exon arrays and a set of improved downstream approaches. The improvements include: (i) a new gamma model for exon arrays which calculates isoform and gene expression measurements and a level of uncertainty associated with the estimates, using the multi-mappings between probes, isoforms and genes, (ii) a variant of the existing approach for the probe-level analysis of Affymetrix 3' GeneChip data to produce more stable gene expression estimates, (iii) an improved method for detecting differential expression which is computationally more efficient than the existing approach in the package and (iv) an improved method for robust model-based clustering of gene expression, which takes technical and biological replicate information into consideration.

Conclusions: With the extensions and improvements, the puma package is now applicable to the analysis of both Affymetrix 3' GeneChips and Exon arrays for gene and isoform expression estimation. It propagates the uncertainty of expression measurements into more efficient and comprehensive downstream analysis at both gene and isoform level. Downstream methods are also applicable to other expression quantification platforms, such as RNA-Seq, when uncertainty information is available from expression measurements. puma is available through Bioconductor and can be found at http://www.bioconductor.org.

PubMed Disclaimer

Figures

**Figure 2**
**ROC curves from different methods for 2-replicate Exon array data.** The ROC curves are obtained from the average over the 5 runs each of which randomly selects two replicates. Gene expression estimation methods RMA, PLIER and GMA, are combined with different finding-DE-gene methods, t-test, PPLR and IPPLR. PLIER provides only a point estimate for gene expression and therefore is not applicable to PPLR and IPPLR. The number after PPLR indicates the sample number used in the importance sampling of the algorithm.

**Figure 3**
**ROC curves from different methods for 5-replicate Exon array data.** Gene expression estimation methods are combined with different finding-DE-gene methods. PLIER provides only a point estimate for gene expression and therefore is not applicable to PPLR and IPPLR. The number after PPLR indicates the sample number used in the importance sampling of the algorithm.

**Figure 4**
**Distribution of isoform expression for gene ORAOV1.** The distributions of the estimated isoform expression for the two alternatively spliced transcripts of gene ORAOV1 in the 15 cell lines are calculated from GME. The blue lines are for 11q13+ group and red lines for 11q13- group. The bold lines are the distributions of the mean expression for each group, obtained from PPLR. Expression is on the log scale.

**Figure 5**
**Distribution of isoform expression for gene NEO1.** The distributions of the estimated isoform expression for the two alternatively spliced transcripts of gene NEO1 in the 15 cell lines are calculated from GME. The blue lines are for 11q13+ group and red lines for 11q13- group. The bold lines are the distributions of the mean expression for each group, obtained from PPLR. Expression is on the log scale.

**Figure 6**
**The partition of qRT-PCR validated probe-sets in H133 GeneChip dataset.** Gene expression estimates are calculated from multi-mgMOS. The scatter plot is drawn with expression of HBRR sample against UHRR sample. Line l₁:y=−x+8 and line l₂:y=−x+14 partition the 656 qRT-PCR validated probe-sets into 3 groups, labelled as “low”, “median” and “high”.

**Figure 7**
**ROC curves from different methods for U133 GeneChip data.** ROC curves are calculated from different gene expression estimation methods, RMA, multi-mgMOS and PM-only multi-mgMOS, combined with PPLR for “low”, “median”, “high” and “all” groups of U133 GeneChips data.

**Figure 8**
**Distribution of expression difference between two conditions for U133 GeneChip data.** Probe-set 220818_s_at is a low expression DE gene and probe-set 203073_at is a relatively highly expressed non-DE gene. The blue lines stand for the distributions of expression difference between two conditions calculated from multi-mgMOS and the red lines for PM-only multi-mgMOS.

See this image and copyright information in PMC

Cited by

Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms.
Dapas M, Kandpal M, Bi Y, Davuluri RV. Dapas M, et al. Brief Bioinform. 2017 Mar 1;18(2):260-269. doi: 10.1093/bib/bbw016. Brief Bioinform. 2017. PMID: 26944083 Free PMC article.
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.
Liu X, Shi X, Chen C, Zhang L. Liu X, et al. BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6. BMC Bioinformatics. 2015. PMID: 26475308 Free PMC article.
Analysis of key genes and their functions in placental tissue of patients with gestational diabetes mellitus.
Wang Y, Yu H, Liu F, Song X. Wang Y, et al. Reprod Biol Endocrinol. 2019 Nov 29;17(1):104. doi: 10.1186/s12958-019-0546-z. Reprod Biol Endocrinol. 2019. PMID: 31783860 Free PMC article.
A data-driven approach links microglia to pathology and prognosis in amyotrophic lateral sclerosis.
Cooper-Knock J, Green C, Altschuler G, Wei W, Bury JJ, Heath PR, Wyles M, Gelsthorpe C, Highley JR, Lorente-Pons A, Beck T, Doyle K, Otero K, Traynor B, Kirby J, Shaw PJ, Hide W. Cooper-Knock J, et al. Acta Neuropathol Commun. 2017 Mar 16;5(1):23. doi: 10.1186/s40478-017-0424-x. Acta Neuropathol Commun. 2017. PMID: 28302159 Free PMC article.
Pulsatile exposure to simulated reflux leads to changes in gene expression in a 3D model of oesophageal mucosa.
Green NH, Nicholls Z, Heath PR, Cooper-Knock J, Corfe BM, MacNeil S, Bury JP. Green NH, et al. Int J Exp Pathol. 2014 Jun;95(3):216-28. doi: 10.1111/iep.12083. Epub 2014 Apr 8. Int J Exp Pathol. 2014. PMID: 24713057 Free PMC article.

References

1. Łabaj PP, Leparc GG, E LB, Markillie LM, S WH, P KD. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011;27(13):i383–i391. doi: 10.1093/bioinformatics/btr247. - DOI - PMC - PubMed
1. Pearson RD, Liu X, Sanguinetti G, Milo M, D LN, Rattray M. puma: a bioconductor package for propagating uncertainty in microarray analysis. BMC Bioinformatics. 2009;10:211. doi: 10.1186/1471-2105-10-211. - DOI - PMC - PubMed
1. Liu X, Milo M, Lawrence ND, Rattray M. A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics. 2005;21:3637–3644. doi: 10.1093/bioinformatics/bti583. - DOI - PubMed
1. Sanguinetti G, MIlo M, Rattray M, Lawrence ND. Accounting for probe-level noise in principal component analysis of mmicroarray data. Bioinformatice. 2005;21:3748–3754. doi: 10.1093/bioinformatics/bti617. - DOI - PubMed
1. Liu X, Milo M, Lawrence ND, Rattray M. Probe-level measurement error improves accuracy in detecting differential gene expression. Bioinformatics. 2006;22:2107–2113. doi: 10.1093/bioinformatics/btl361. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

BB/H018123/2/Biotechnology and Biological Sciences Research Council/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis

Affiliation

puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials