Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 1;109(508):1466-1480.
doi: 10.1080/01621459.2014.950735.

Generalized species sampling priors with latent Beta reinforcements

Affiliations

Generalized species sampling priors with latent Beta reinforcements

Edoardo M Airoldi et al. J Am Stat Assoc. .

Abstract

Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data.

Keywords: Bayesian non-parametrics; Cancer; Genomics; MCMC; Predictive Probability Functions; Random Partitions; Species Sampling Priors.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Posterior distribution of the number of clusters in the simulation of Section 5.3 (τ = 0.25). Case (a) corresponds to a Beta-GOS(αn = n, βn = 1), case (b) to a Beta-GOS(αn = βn = 1) and case (c) to a Dirichlet Process with parameter θ = 1.
Figure 2
Figure 2
Illustrative segmentation-type plots for the simulation study in Section 5.4. Column (a): subset of data for two replicates. Column (b) top: an example of allocation for a Beta-Gos(αn = 1, βn = 1) plotted vs the truth (black line); column (b) bottom considers a Beta-Gos(αn = n, βn = 1). Column (c) illustrates the fitting by a HMM with 4 states.
Figure 3
Figure 3
Model fit overview: Array CGH gains and losses on chromosome 8 for two samples of breast tumors in the dataset in (Chin et al., 2006). Points with different shapes denote different clusters.
Figure 4
Figure 4
A) Frequencies of genome copy number gains and losses plotted as a function of genomic location. B) Frequency of tumors showing high-level amplification. The dashed vertical lines separate the 23 chromosomes.

Similar articles

Cited by

References

    1. Airoldi EM, Anderson A, Fienberg SE, Skinner KK. Who wrote Ronald Reagan’s radio addresses? Bayesian Analysis. 2006;1:289–320.
    1. Aoki M. Journal of Economic Dynamics and Control. 1. Vol. 32. Elsevier; 2008. Thermodynamic limits of macroeconomic or financial models: One-and two-parameter Poisson-Dirichlet models; pp. 66–84.
    1. Baladandayuthapani V, Ji Y, Nieto-Barajas LE, Morris JS. Bayesian random segmentation models to identify shared copy number aberrations for array CGH data. Journal of the American Statistical Association. 2010;105:1358–1375. - PMC - PubMed
    1. Bassetti F, Crimaldi I, Leisen F. Conditionally identically distributed species sampling sequences. Adv. in Appl. Probab. 2010;42:433–459.
    1. Berti P, Pratelli L, Rigo P. Limit Theorems for a Class of Identically Distributed Random Variables. Ann. Probab. 2004;32:2029–2052.