. 2017 Apr 17;12(4):e0175775.

doi: 10.1371/journal.pone.0175775. eCollection 2017.

An algorithm for separation of mixed sparse and Gaussian sources

Ameya Akkalkotkar¹, Kevin Scott Brown^{1

2

3

4}

Affiliations

¹ Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT, United States of America.
² Department of Biomedical Engineering, University of Connecticut, Storrs, CT, United States of America.
³ Departments of Physics, and Marine Sciences, University of Connecticut, Storrs, CT, United States of America.
⁴ Institute for Systems Genomics and CT Institute for the Brain & Cognitive Sciences, Storrs, CT, United States of America.

PMID: 28414814
PMCID: PMC5393591
DOI: 10.1371/journal.pone.0175775

An algorithm for separation of mixed sparse and Gaussian sources

Ameya Akkalkotkar et al. PLoS One. 2017.

. 2017 Apr 17;12(4):e0175775.

doi: 10.1371/journal.pone.0175775. eCollection 2017.

Authors

Ameya Akkalkotkar¹, Kevin Scott Brown^{1

2

3

4}

Affiliations

¹ Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT, United States of America.
² Department of Biomedical Engineering, University of Connecticut, Storrs, CT, United States of America.
³ Departments of Physics, and Marine Sciences, University of Connecticut, Storrs, CT, United States of America.
⁴ Institute for Systems Genomics and CT Institute for the Brain & Cognitive Sciences, Storrs, CT, United States of America.

PMID: 28414814
PMCID: PMC5393591
DOI: 10.1371/journal.pone.0175775

Abstract

Independent component analysis (ICA) is a ubiquitous method for decomposing complex signal mixtures into a small set of statistically independent source signals. However, in cases in which the signal mixture consists of both nongaussian and Gaussian sources, the Gaussian sources will not be recoverable by ICA and will pollute estimates of the nongaussian sources. Therefore, it is desirable to have methods for mixed ICA/PCA which can separate mixtures of Gaussian and nongaussian sources. For mixtures of purely Gaussian sources, principal component analysis (PCA) can provide a basis for the Gaussian subspace. We introduce a new method for mixed ICA/PCA which we call Mixed ICA/PCA via Reproducibility Stability (MIPReSt). Our method uses a repeated estimations technique to rank sources by reproducibility, combined with decomposition of multiple subsamplings of the original data matrix. These multiple decompositions allow us to assess component stability as the size of the data matrix changes, which can be used to determinine the dimension of the nongaussian subspace in a mixture. We demonstrate the utility of MIPReSt for signal mixtures consisting of simulated sources and real-word (speech) sources, as well as mixture of unknown composition.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared no competing interests exist.

Figures

**Fig 1. Schematic for MIPReSt.**
MIPReSt runs the RAICAR algorithm on both the original data matrix X and many random subsamples of smaller column dimension. Comparison of the reproducibilities from the original data and the random subsamples determines the size of the sparse subspace. After projecting that subspace out of X, singular value decomposition $\tilde{X}$ , along with an eigenvalue selection rule, produces both the dimension of the Gaussian subspace and a basis for that subspace. (See Methods for details.).

**Fig 2. Examples of super- and subgaussian sources.**
Shown here are histograms for a Gaussian source (black), a subgaussian source (the generalized Gaussian), and a supergaussian source (Laplace). Also shown is a histogram for one of the speech signals used in this study. The speech signal is far more leptokurtic than the Laplace source; without truncating the y-axis the massive spike near zero of the speech signal obscures the shapes of the other distributions.

**Fig 3. Full rank extraction.**
We constructed a simulated data matrix with five sources: one supergaussian, one subgaussian, and three Gaussian sources. The simulated data matrix had 5 × 10⁵ samples. The main panel shows the results of RAICAR extractions at different levels of decimation, including the parent data. The best assignment match to the supergaussian source is shown in blue and to the subgaussian source in red. While the Gaussian sources may sometimes have extrememly high reproducibility, they show poor stability when the data is decimated, in constrast to the sparse sources. The top panel shows scatter plots of the estimated sources from the parent data against their best assignment match; the sparse sources are recovered perfectly by RAICAR.

**Fig 4. Reproducibility (R) and reproducibility fluctuations (δ_ij) from overextraction.**
Only five sources (Gaussian or otherwise) are present, but the mixture dimension is ten. Horizontal bars are located at the median value. There are clearly three groups of sources here. Two sources (the recovered sparse sources) have near-perfect R that does not fluctuate from decimation-to-decimation. Three sources have occasionally high reproducibility, but also significant δ_ij; these are the Gaussian subspace. The remaining five sources have very low reproducibility that fluctuates very little; these sources are spurious sources resulting from overextraction.

**Fig 5. Reproducibility (R) and reproducibility fluctuations (δ_ij) for speech signals mixed with Gaussian sources.**
For each of the fifteen extracted sources, R is shown in red and δ_ij in black. For both quantities, values for each of the fifty subsampled data matrices are shown as points and the median value as a horizontal bar. The sources clearly group into three categories: high R with low δ_ij (true sparse sources), variable R with high δ_ij (Gaussian sources), and low R and δ_ij (spurious sources).

**Fig 6. Reproducibility plot for the Iris data.**
The format and color scheme for this figure is identical to that used in Figs 4 and 5. Based on this information and related discussion in the text, it appears that there is one (and likely only one) sparse source present in the iris data.

**Fig 7. Histograms of extracted sources from the Iris data.**
Each panel shows a histogram (bars) and kernel density estimate (Gaussian kernel, solid line) for one of the four RAICAR sources extracted from the iris data. The nongaussianity of the most reproducible source (upper left) is clearly evident.

See this image and copyright information in PMC

References

1. Jutten C, Herault J. Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture. Signal Process. 1991;24:1–10. 10.1016/0165-1684(91)90079-X - DOI
1. Comon P, Jutten C, editors. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press;.
1. Joliffe IT. Principal Component Analysis. Springer; 2002. 10.1002/0470013192.bsa501 - DOI
1. Lorenz EN. Empirical orthogonal functions and statistical weather prediction Statistical Forecasting Project, Massachusetts Institute of Technology Department of Meterology; 1956. 1.
1. Gerbrands JJ. On the relationships between SVD, KLT and PCA. Pattern Recognit. 1981;14:375–381. 10.1016/0031-3203(81)90082-0 - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An algorithm for separation of mixed sparse and Gaussian sources

Affiliations

An algorithm for separation of mixed sparse and Gaussian sources

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous