Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jan 11:2024.01.09.574919.
doi: 10.1101/2024.01.09.574919.

Generative modeling of biological shapes and images using a probabilistic α-shape sampler

Affiliations

Generative modeling of biological shapes and images using a probabilistic α-shape sampler

Emily T Winn-Nuñez et al. bioRxiv. .

Abstract

Understanding morphological variation is an important task in many areas of computational biology. Recent studies have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets. However, when creating a model for shape statistics, real data can be difficult to access and the sample sizes for these data are often small due to them being expensive to collect. Meanwhile, the current landscape of generative models for shapes has been mostly limited to approaches that use black-box inference-making it difficult to systematically assess the power and calibration of sub-image models. In this paper, we introduce the α-shape sampler: a probabilistic framework for generating realistic 2D and 3D shapes based on probability distributions which can be learned from real data. We demonstrate our framework using proof-of-concept examples and in two real applications in biology where we generate (i) 2D images of healthy and septic neutrophils and (ii) 3D computed tomography (CT) scans of primate mandibular molars. The α-shape sampler R package is open-source and can be downloaded at https://github.com/lcrawlab/ashapesampler.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. An example of various α-shapes for the same set of points under different choices for the numerical parameter α.
Here, we consider different parameter values (a) α=0.05, (b), α=0.10, (c) α=0.15, (d) α=0.2, and (e) α=0.35. In each panel, the gray shapes are the intersection of balls of radius α and the Voronoi cells at each point. The pink triangles are then faces representing the collective interior, and the blue lines are edges of the α-complex. The bold blue edges are known as the “boundary edges” and denote the α-shape for each panel. In (a) and (b), where α is smaller, we have disconnected components. In (c), we see an instance where edges may form the boundary of a face, but the face is not quite yet filled in since the three Voronoi cells have not collectively met. In (d), the faces are filled in and one of the points becomes an interior point while the rest remain α-extreme points. In (e), α is large enough such that the given α-complex is the Deluanay triangulation and convex hull of the point set. When determining how to generate a new shape from an existing dataset, we use information within the given simplicial complex to determine how many points are needed, where the points should be sampled, and the appropriate α parameter to connect the points. For a more detailed overview and theoretical discussion of concepts surrounding α-shapes, see Materials and Methods and Supporting Information.
Figure 2.
Figure 2.. Schematic overview of the α-shape sampler: a probabilistic framework for simulating realistic 2D and 3D images and shapes.
(a) A general illustration of the pre- and post-processing workflow in the α-shape sampler software. In step (i), the user inputs data of real shapes in some format—in this case, binary masks for illustration. We refer to these data as “reference” shapes. In step (ii), the reference masks are converted to triangular meshes which are treated as simplicial complexes. In step (iii), the reference meshes are input into the shape generation pipeline which, in step (iv), outputs newly generated shapes in the form of α-complexes. Finally, in step (v), these generated α-complexes are converted back to match the same format as the original input data (again, here, binary masks). (b) Details underlying the algorithm for generating new shapes via the α-shape sampler. (i) A collection meshes from N reference shapes are given to the software. For simplicity, we assume that these shapes are from the same phenotypic class and, thus, their points are from the same manifold. (ii) Next, we estimate the reach τi for each reference shape by computing the distance to edge neighbors for each point (i.e., vertex in the mesh) and the circumcenters to neighboring faces (note that we also evaluate tetrahedra for 3D objects). The next closest vertex is the value τp for point p, and the smallest τp among all points is the value of τi for the i-th reference shape. We then take the minimum τ=(τ1,,τN) to be the representative estimate of the reach τ^ for all reference shapes. (iii) We create a partial point cloud by combining points from J reference shapes in our input dataset, where 2JN. Next, we sample new points from a ball of radius τ^8 around vertices in the partial point cloud. Each new point is accepted or rejected according to a probability-based rule. (iv) Once we have the newly sampled point cloud, we set α=τ^ϵ, where ϵ>0 is arbitrarily small, and generate the α-complexes for new shapes.
Figure 3.
Figure 3.. Qualitative comparisons of real and generated 2D annuli and 3D tori using the α-shape sampler.
Panels (a) and (b) show real (gray) and generated (orange) annuli. Similarly, in panels (c) and (d), we show real (gray) and generated (orange) tori. Overall, we see that the α-shape sampler generates slightly thicker shapes than the examples in the original dataset (see Tables S1 and S2 for a quantitative evaluation). Nonetheless, the generated shapes preserve the most important topological property in that they all have exactly one connected component and exactly one hole.
Figure 4.
Figure 4.. Application of the α-shape sampler to generate synthetic 2D images of healthy and septic neutrophils.
(a) Examples of real healthy (blue), generated healthy (light blue), real septic (black), and generated septic (gray) neutrophils in gels with stiffness 1.5 kilopascals (kPa). Each synthetic neutrophil in the second row was generated using the two shapes it sits in between in the row above. Variation in the newly generate cells can be most seen along the boundary, which is a function of the sampling process in the α-shape pipeline. When comparing the generated and real cells, perhaps most noticeable are (i) the differences in area and (ii) the number of protrusions in the healthy versus septic cells. (b) We use a manifold regularized autoencoder (MRAE) to show that the generated shapes cluster and intermix with real cells in their respective categories. This provides evidence that the images being generated by the α-shape sampler are realistic. (c) We compute the area, perimeter, circularity, solidity, convexity, and compactness of each real and generated cell. Next, we compare the distribution of these measurements for the healthy and septic groups, respectively. Here, if the α-shape is able to preserve geometric and morphological characteristics while generating new data, then we would expect the distributions of these measurements to line up within a group. Note that due to the high heterogeneity and difficulty aligning shapes, the generated septic neutrophils are slightly larger in area and perimeter than the real ones. However, the generated neutrophils with the α-shape sampler still capture other key shape characteristics.
Figure 5.
Figure 5.. Application of the α-shape sampler to generate synthetic 3D primate mandibular molars.
Here, we qualitatively compare meshes of (a) real Microcebus, (b) generated Microcebus, (c) real Tarsius, and (d) generated Tarsius teeth. Morphologically, we know that tarsier teeth have an additional high cusp (highlighted in red) which allows this genus of primate to eat a wider range of foods. Here, we see that the generated Tarsius teeth from the α-shape sampler preserve the unique paraconids. In panel (e), we show the phylogenetic relationship between the Microcebus and Tarsius genus. It has been estimated that the divergence dates of the Microcebus and Mirza from Tarsius happened around five million years before the branching of Tarsius from Saimiri. (f) We use a manifold regularized autoencoder (MRAE) to show that the generated teeth cluster and intermix with the real Microcebus and Tarsius teeth, respectively. Figure S4 shows that the same results hold regardless of the dimensionality reduction technique that is used.

References

    1. Crawford Lorin, Monod Anthea, Chen Andrew X., Mukerhjee Sayan, and Rabadán Raúl. Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. Journal of the American Statistical Association, 115:1139–1150, 2020. doi: 10.1080/01621459.2019.1671198. - DOI
    1. Boyer Doug M., Lipman Yaron, Clair Elizabeth, Puente Jesus, Patel Biren A., Funkhouser Thomas, Jernvall Jukka, and Ingrid Daubechies. Algorithms to automatically quantify the geometric similarity of anatomical surfaces. Proceedings of the National Academy of Sciences, 108(45):18221–18226, 2011. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1112822108. URL https://pnas.org/doi/full/10.1073/pnas.1112822108. - DOI - DOI - PMC - PubMed
    1. Theska Tobias, Sieriebriennikov Bogdan, Wighard Sara S, Werner Michael S, and Sommer Ralf J, Geometric morphometrics of microscopic animals as exemplified by model nematodes. Nature Protocols, pages 1–34, 2020. - PubMed
    1. Evans Kory M., Larouche Olivier, Watson Sara-Jane, Farina Stacy, Habegger María Laura, and Friedman Matt. Integration drives rapid phenotypic evolution in flatfishes. Proceedings of the National Academy of Sciences, 118(18):e2101330118, 2021. doi: 10.1073/pnas.2101330118. URL https://www.pnas.org/doi/abs/10.1073/pnas.2101330118. - DOI - DOI - PMC - PubMed
    1. Pincus Zachary and Theriot JA. Comparison of quantitative methods for cell-shape analysis. Journal of Microscopy, 227(2):140–156, 2007. - PubMed

Publication types