Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 17;12(4):e0175775.
doi: 10.1371/journal.pone.0175775. eCollection 2017.

An algorithm for separation of mixed sparse and Gaussian sources

Affiliations

An algorithm for separation of mixed sparse and Gaussian sources

Ameya Akkalkotkar et al. PLoS One. .

Abstract

Independent component analysis (ICA) is a ubiquitous method for decomposing complex signal mixtures into a small set of statistically independent source signals. However, in cases in which the signal mixture consists of both nongaussian and Gaussian sources, the Gaussian sources will not be recoverable by ICA and will pollute estimates of the nongaussian sources. Therefore, it is desirable to have methods for mixed ICA/PCA which can separate mixtures of Gaussian and nongaussian sources. For mixtures of purely Gaussian sources, principal component analysis (PCA) can provide a basis for the Gaussian subspace. We introduce a new method for mixed ICA/PCA which we call Mixed ICA/PCA via Reproducibility Stability (MIPReSt). Our method uses a repeated estimations technique to rank sources by reproducibility, combined with decomposition of multiple subsamplings of the original data matrix. These multiple decompositions allow us to assess component stability as the size of the data matrix changes, which can be used to determinine the dimension of the nongaussian subspace in a mixture. We demonstrate the utility of MIPReSt for signal mixtures consisting of simulated sources and real-word (speech) sources, as well as mixture of unknown composition.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared no competing interests exist.

Figures

Fig 1
Fig 1. Schematic for MIPReSt.
MIPReSt runs the RAICAR algorithm on both the original data matrix X and many random subsamples of smaller column dimension. Comparison of the reproducibilities from the original data and the random subsamples determines the size of the sparse subspace. After projecting that subspace out of X, singular value decomposition X˜, along with an eigenvalue selection rule, produces both the dimension of the Gaussian subspace and a basis for that subspace. (See Methods for details.).
Fig 2
Fig 2. Examples of super- and subgaussian sources.
Shown here are histograms for a Gaussian source (black), a subgaussian source (the generalized Gaussian), and a supergaussian source (Laplace). Also shown is a histogram for one of the speech signals used in this study. The speech signal is far more leptokurtic than the Laplace source; without truncating the y-axis the massive spike near zero of the speech signal obscures the shapes of the other distributions.
Fig 3
Fig 3. Full rank extraction.
We constructed a simulated data matrix with five sources: one supergaussian, one subgaussian, and three Gaussian sources. The simulated data matrix had 5 × 105 samples. The main panel shows the results of RAICAR extractions at different levels of decimation, including the parent data. The best assignment match to the supergaussian source is shown in blue and to the subgaussian source in red. While the Gaussian sources may sometimes have extrememly high reproducibility, they show poor stability when the data is decimated, in constrast to the sparse sources. The top panel shows scatter plots of the estimated sources from the parent data against their best assignment match; the sparse sources are recovered perfectly by RAICAR.
Fig 4
Fig 4. Reproducibility (R) and reproducibility fluctuations (δij) from overextraction.
Only five sources (Gaussian or otherwise) are present, but the mixture dimension is ten. Horizontal bars are located at the median value. There are clearly three groups of sources here. Two sources (the recovered sparse sources) have near-perfect R that does not fluctuate from decimation-to-decimation. Three sources have occasionally high reproducibility, but also significant δij; these are the Gaussian subspace. The remaining five sources have very low reproducibility that fluctuates very little; these sources are spurious sources resulting from overextraction.
Fig 5
Fig 5. Reproducibility (R) and reproducibility fluctuations (δij) for speech signals mixed with Gaussian sources.
For each of the fifteen extracted sources, R is shown in red and δij in black. For both quantities, values for each of the fifty subsampled data matrices are shown as points and the median value as a horizontal bar. The sources clearly group into three categories: high R with low δij (true sparse sources), variable R with high δij (Gaussian sources), and low R and δij (spurious sources).
Fig 6
Fig 6. Reproducibility plot for the Iris data.
The format and color scheme for this figure is identical to that used in Figs 4 and 5. Based on this information and related discussion in the text, it appears that there is one (and likely only one) sparse source present in the iris data.
Fig 7
Fig 7. Histograms of extracted sources from the Iris data.
Each panel shows a histogram (bars) and kernel density estimate (Gaussian kernel, solid line) for one of the four RAICAR sources extracted from the iris data. The nongaussianity of the most reproducible source (upper left) is clearly evident.

Similar articles

References

    1. Jutten C, Herault J. Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture. Signal Process. 1991;24:1–10. 10.1016/0165-1684(91)90079-X - DOI
    1. Comon P, Jutten C, editors. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press;.
    1. Joliffe IT. Principal Component Analysis. Springer; 2002. 10.1002/0470013192.bsa501 - DOI
    1. Lorenz EN. Empirical orthogonal functions and statistical weather prediction Statistical Forecasting Project, Massachusetts Institute of Technology Department of Meterology; 1956. 1.
    1. Gerbrands JJ. On the relationships between SVD, KLT and PCA. Pattern Recognit. 1981;14:375–381. 10.1016/0031-3203(81)90082-0 - DOI

LinkOut - more resources