Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 18;117(33):19664-19669.
doi: 10.1073/pnas.2001741117. Epub 2020 Aug 3.

Geometric anomaly detection in data

Affiliations

Geometric anomaly detection in data

Bernadette J Stolz et al. Proc Natl Acad Sci U S A. .

Abstract

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.

Keywords: persistent cohomology; singularities; stratification inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Annular neighborhood classes Ax of several points x in union of a hemisphere with a plane (both blue) along a circle (red). Regular points, which lie away from this red circle and from the boundaries, have Ax, which looks like a standard annulus. All points lying in the boundary have Ax, which resembles a half or quarter annulus, as depicted in the central panel. All points x on the red circle itself have neighborhoods Ax, which resemble two annuli glued along two edges, as indicated in the right panel. The dimensions of H1(Ax) count the number of independent loops in Ax, so from left to right, these are one, zero, and three, respectively.
Fig. 2.
Fig. 2.
Two-dimensional depiction of the 3D IsoMAP projection (4) of points sampled from the 24-dimensional conformation space of cyclo-octane. Points x for which dimH1(Ax)>1 have been colored red, and these clearly appear to cluster near the two embedded circles where the two surfaces intersect. We show the full set of points in A and B and additionally highlight the intersection points identified by our method separately in C and D, using the same perspectives as A and B, respectively. The perspective in B and D corresponds to a counterclockwise rotation around the z axis (<90) and a counterclockwise rotation around the x axis (<45) of the perspective in A and C.
Fig. 3.
Fig. 3.
Two-dimensional projections of points sampled from Henneberg’s minimal surface immersed in 3D space. Points x for which dimH1(Ax)>1 are shown in red, and these lie along the four self-intersections. Similarly, points x for which dimH1(Ax)=0 have been colored cyan and appear near the boundary. The perspective in B corresponds to a counterclockwise rotation around the z axis (<90) and a counterclockwise rotation around the x axis (90) of the perspective in A; we indicate the x, y, and z axes to facilitate comparison.
Fig. 4.
Fig. 4.
Robustness with respect to the choice of local neighborhood size for the cyclo-octane dataset using local persistent cohomology (PCoh; purple line) and local PCA (orange). The horizontal axis represents local neighborhood size, while the vertical axis corresponds to the Hausdorff distance between the intersection points Sr selected by each method with neighborhood radius r on one hand and a set of ideal reference points RSr on the other. This distance is defined to be the smallest ϵ>0 so that the union of radius ϵ balls around points in R contains all of the points in Sr. AF illustrate the singularities detected at neighborhood radii corresponding to points a–f, respectively. The extreme points responsible for the steep increase in the Hausdorff distance for each method are indicated with black arrows.

References

    1. Fefferman C., Mitter S., Narayanan H., Testing the manifold hypothesis. J. Am. Math. Soc. 29, 983–1049 (2016).
    1. Lee J. A., Verleysen M., Nonlinear Dimensionality Reduction (Springer-Verlag, 2008).
    1. Ringner Markus., What is principal component analysis?. Nat. Biotechnol. 26, 303–304 (2008). - PubMed
    1. Tenenbaum J. B., De Silva V., Langford J. C., A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000). - PubMed
    1. Sebastian Seung H., Lee D. D., The manifold ways of perception. Science 290, 2268–2269 (2000). - PubMed

Publication types

LinkOut - more resources