Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 26;15(1):1763.
doi: 10.1038/s41467-024-46106-0.

SEMORE: SEgmentation and MORphological fingErprinting by machine learning automates super-resolution data analysis

Affiliations

SEMORE: SEgmentation and MORphological fingErprinting by machine learning automates super-resolution data analysis

Steen W B Bender et al. Nat Commun. .

Abstract

The morphology of protein assemblies impacts their behaviour and contributes to beneficial and aberrant cellular responses. While single-molecule localization microscopy provides the required spatial resolution to investigate these assemblies, the lack of universal robust analytical tools to extract and quantify underlying structures limits this powerful technique. Here we present SEMORE, a semi-automatic machine learning framework for universal, system- and input-dependent, analysis of super-resolution data. SEMORE implements a multi-layered density-based clustering module to dissect biological assemblies and a morphology fingerprinting module for quantification by multiple geometric and kinetics-based descriptors. We demonstrate SEMORE on simulations and diverse raw super-resolution data: time-resolved insulin aggregates, and published data of dSTORM imaging of nuclear pore complexes, fibroblast growth receptor 1, sptPALM of Syntaxin 1a and dynamic live-cell PALM of ryanodine receptors. SEMORE extracts and quantifies all protein assemblies, their temporal morphology evolution and provides quantitative insights, e.g. classification of heterogeneous insulin aggregation pathways and NPC geometry in minutes. SEMORE is a general analysis platform for super-resolution data, and being a time-aware framework can also support the rise of 4D super-resolution data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic illustration of SEMORE, an automated pipeline to agnostically cluster and classify temporarily and morphologically distinct protein aggregates.
a SEMORE input is a set of x, y SMLM input (PALM or STORM) or x, y, t time-resolved SMLM (TR-SMLM) (REPLOM input) coordinates of individual localization/aggregation events, here shown for temporally resolved insulin aggregation imaged using the REPLOM approach on a TIRF microscope. b The first step of SEMORE clusters data by a density-based clustering method in three dimensions of the spatial coordinates, x and y, and time, t. Colours indicate clusters and scale bar shows 10 μm. c The second step is a temporal refinement of the initial clusters to identify and dissect underlying sub-clusters, utilizing a time-directional clustering through the iteration of frames. d The final output of the temporal refinement is a set of individual spatially resolved structures that are now separable even if grown close to other aggregations. e Each identified cluster is fed to a morphology-fingerprinting module that computes four groups of descriptive features, including circularity of the morphology, graph network within the aggregation, general symmetry, and geometric interior. Combined these feature groups construct the individual self-assembly fingerprint of a total of 40+ features. f The calculated morphology fingerprints are stored for each extracted protein assembly allowing for complete quantification and insights into the distribution of heterogeneous morphology or growth pathways. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Performance evaluation of SEMORE clustering module on classification of three diverse types of morphologies inspired by biological systems.
a Three classes of time-resolved aggregations were simulated to capture a broad aspect of biological systems (see Methods): isotropic, where aggregates grow radially, where aggregates grow in response to steric hindrance and branching fibrils where aggregates grow linearly followed by branching. The three inserts depict the general pipeline for cluster identification: From left to right Aggregates with diverse final morphologies are produced in a frame-by-frame manner, with the amount and locations of particles randomly drawn based on previous localizations and start and end times randomly drawn. Uniform noise is added in all three dimensions (x, y, time). The model accurately predicts diverse aggregates, showcased by different colours. The black point corresponds to data points predicted as the wrong label, i.e., either noise predicted as an aggregate point or multiple predicted aggregates for the same ground truth label (FP) while the brown points correspond to aggregational locations predicted as noise (FN). b Quantification of operational performance by a confusion matrix. Predictions are shown from 50 experiments for each aggregation type, each containing 10 individual aggregations for isotropic and random, and 25 for fibril growth. Errors are standard deviations calculated across accuracies for each individual aggregate. Common classification metrics for the evaluation are shown in the table on the right side of the corresponding confusion matrix. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Performance evaluation of Morphology fingerprinting module on three diverse assembly morphologies.
a The three diverse morphological structures of Fig. 2 are subjected to the morphology fingerprinting module. Each colour represents a cluster but brown-red that represents noise detections. b The derived features are dimensionality reduced by a 3-component UMAP to visualize the separation of the identified clusters in the latent space and the grouping of the diverse morphologies. The dimensionality-reduced features are clustered using DBSCAN to identify groups of fingerprints. The four identified cluster groups are displayed, corresponding to three different simulated aggregational structures, as well as a cluster containing only pure noise. (Spherical zoom, points coloured by frame value), Further analysis of the group corresponding to fibrils by an additional 3-component UMAP and a new DBSCAN (square dashed line zoom on top), identified two local clusters mainly containing branched and non-branched fibrils respectively (see Supplementary Fig. 13) (spherical zoom, points coloured by frame value). c The count of each simulation type is found through a simple investigation of clusters 1 to 4, where cluster 2 only contains data from the fibril simulation and is deemed noise by visual inspection. d Confusion matrix of classification accuracy for each cluster after the removal of noise, Cluster 1 predicts fibril (sensitivity 99.92%, F1 99.96%), 3 random (sensitivity 99.21%, F1 99.31%), and 4 isotropic growth (sensitivity 99.58%, F1 99.38%), resulting in an average F1 score at 99.55 ± 0.21%, clearly showing the descriptive information of morphology captured within the fingerprinting. Errors are standard deviations calculated across all aggregates. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. The SEMORE pipeline generalizes across widely diverse experimental systems, time-resolved insulin aggregation and Nuclear Pore complex (NPC) assembly.
a Top: Final frame of accumulated super-resolution localizations from temporally resolved insulin aggregation. Bottom: Identification of each aggregate depicted as a distinct colour and calculation of its corresponding fingerprint by SEMORE. The scale bar shows 10μm. b The collective fingerprints are processed through a 2-component UMAP and clustered using DBSCAN, resulting in two clusters: cluster 1 (red) contains low-density elongated anisotropic growth patterns, and cluster 2 (gray) contains isotopically grown high-density spherical-like structures. c Nine representative aggregates for each of the anisotropic and isotropic clusters, with points coloured by frame value. d Top: Accumulated super localizations of NPC assemblies from ref. . Bottom: Identification of each assembly depicted as distinct colour and calculation of its corresponding fingerprint by SEMORE. The scale bar shows 1μm. e Processed fingerprints of NPC and 2-component UMAP and clustered using DBSCAN in 3 clusters: Cluster 1 (green) corresponds to individual NPC assemblies, cluster 2 (black) to overlapping NPC assemblies and cluster 3 (gray) to noise. f Overlay of the clustered NPC color-coded based on their classification. Scale bar shows 1μm. g extracted radius of NPC consistent with earlier reports. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Vendruscolo M, Fuxreiter M. Protein condensation diseases: therapeutic opportunities. Nat. Commun. 2022;13:5550. - PMC - PubMed
    1. Laursen T, et al. Characterization of a dynamic metabolon producing the defense compound dhurrin in sorghum. Science. 2016;354:890–893. - PubMed
    1. Wu H, Fuxreiter M. The structure and dynamics of higher-order assemblies: amyloids, signalosomes, and granules. Cell. 2016;165:1055–1066. - PMC - PubMed
    1. Gutierrez C, et al. Structural dynamics of the human COP9 signalosome revealed by cross-linking mass spectrometry and integrative modeling. Proc. Natl Acad. Sci. USA. 2020;117:4088–4098. - PMC - PubMed
    1. Bodily PM, et al. Heterozygous genome assembly via binary classification of homologous sequence. BMC Bioinforma. 2015;16:S5. - PMC - PubMed