Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 1;28(25):3686-3698.
doi: 10.1091/mbc.E17-06-0379. Epub 2017 Oct 11.

Digging deep into Golgi phenotypic diversity with unsupervised machine learning

Affiliations

Digging deep into Golgi phenotypic diversity with unsupervised machine learning

Shaista Hussain et al. Mol Biol Cell. .

Abstract

The synthesis of glycans and the sorting of proteins are critical functions of the Golgi apparatus and depend on its highly complex and compartmentalized architecture. High-content image analysis coupled to RNA interference screening offers opportunities to explore this organelle organization and the gene network underlying it. To date, image-based Golgi screens have based on a single parameter or supervised analysis with predefined Golgi structural classes. Here, we report the use of multiparametric data extracted from a single marker and a computational unsupervised analysis framework to explore Golgi phenotypic diversity more extensively. In contrast with the three visually definable phenotypes, our framework reproducibly identified 10 Golgi phenotypes. They were used to quantify and stratify phenotypic similarities among genetic perturbations. The derived phenotypic network partially overlaps previously reported protein-protein interactions as well as suggesting novel functional interactions. Our workflow suggests the existence of multiple stable Golgi organizational states and provides a proof of concept for the classification of drugs and genes using fine-grained phenotypic information.

PubMed Disclaimer

Figures

FIGURE 1:
FIGURE 1:
Workflow of unsupervised pipeline to uncover Golgi phenotypic classes.
FIGURE 2:
FIGURE 2:
NQC with machine learning. (A) Nuclear clusters example obtained from a GMM applied on all data sets from the HCSU segmentation output. (B) Example of high- vs. low-quality nuclei predicted by the RF classifier trained on manually labeled nuclei. (C) Performance curve of RF with recall % (proportion of good nuclei identified out of total fraction of good nuclei) on horizontal axis and precision % (proportion of good nuclei assigned as correct) on vertical axis with a training data set (80% of labeled data) and a test set (20% of labeled data).
FIGURE 3:
FIGURE 3:
Feature–control-well QC. (A) Example of feature–well selection principle used in this workflow. (B) Reproducibility scores across all features used in this study in independent analysis with the presented workflow. Left features indicated with red arrow are commonly rejected while right features indicated with green are commonly accepted across all replicates used in this study.
FIGURE 4:
FIGURE 4:
Control modeling with one-class SVM. (A) Scatterplot example of control model learned using SVM approach with all cells depicted across two major components after PCA on all features. Control cells are depicted in red and the remaining cell population is depicted in blue. Green curve represents boundary learned for defining control cells space with one-class SVM. (B) Size of total control and non–control-like spaces produced by independent SVM control modeling on replicates tested.
FIGURE 5:
FIGURE 5:
Detection of penetrance (% non–control-like cells) in biological/technical replicates. (A) Bar chart with mean and SD of % non–control-like cells for indicated siRNA treatments (from HPL–derived fluorescence channel) on horizontal axis based on four wells replicate. Control modeling was performed on various replicates. Red dotted line represents threshold for significant penetrance cutoff at 10%. (B) Correlation analysis of penetrance presented in A between sets of technical replicates for biological replicates 1 (red) and 2 (blue). Pearson correlation coefficient R and R2 are indicated in respective replicate colors. (C) Correlation analysis of penetrance presented in A between biological replicates 1 and 2
FIGURE 6:
FIGURE 6:
Unsupervised clustering of non–control-like cells. (A) Key cluster characteristics from unsupervised clustering performed on non–control-like cells. Bar charts depict cluster key output from GMM1 to GMM4. Cluster ID is indicated on horizontal axis; cell counts are indicated on vertical axis. Red dotted line represents threshold at 100 cells/cluster for significant cluster size. (B) Major clusters of GMM2 represented as density maps across several major components of PCA. C1, C2, C3, C4, and C5 are represented.
FIGURE 7:
FIGURE 7:
Representative phenotypic clusters for HPL Golgi stain. From left to right, output from GMM1–GMM4. Four representative non–control-like cells are shown for each cluster group. Clusters are oriented top to bottom in decreasing size order as in Figure 6. Each cell vignette is originally generated by HCSU interface from initial input of 20× Opera Phenix–acquired images with HPL Alexa647 fluorescent dye.
FIGURE 8:
FIGURE 8:
Phenotypic signature. Cluster signature composition in polar plot format for representative examples in different replicates. (A) USE1, STX1A, and COG4 siRNA treatments in GMM1 (Biological Replicate 1, Technical Replicate 1), (B) GMM2 (Biological Replicate 1, Technical Replicate 2), and (C) GMM3 (Biological Replicate 2, Technical Replicate 1). Clusters are oriented in a clockwise manner in decreasing order of size as presented in Figures 6 and 7. Radial axis indicates fraction of total non–control-like cells. Each color-coded plot corresponds to one replicate well. A replicate well reference is indicated in the top left box of each graph with total non–control-like cells number in parentheses. Hellinger distance measuring similarity of signatures is indicated for adjacent signatures.
FIGURE 9:
FIGURE 9:
Reproducibility analysis of Hellinger distance measured between siRNA phenotypic signatures for HPL Golgi stain. (A) Treatment pair Hellinger distances from technical replicates. (B) Treatment pair Hellinger distances from biological replicates. A well-to-well reproducibility factor was set at 0.3 for all data set comparisons (Supplemental Method). Pearson correlation coefficients R and R2 are indicated.
FIGURE 10:
FIGURE 10:
Phenotypic network: hive network plot analysis showing predicted phenotypic association in red. Each association is reproduced at least in a technical and biological replicate on the basis of Hellinger distance <0.2 for indicated paired association. A string network prediction is presented in gray (based on experimental evidence and a 0.7 threshold). A well-to-well reproducibility factor was set at 0.3 for all our Hellinger distance calculations (Supplemental Method).

References

    1. Anitei M, Chenna R, Czupalla C, Esner M, Christ S, Lenhard S, Korn K, Meyenhofer F, Bickle M, Zerial M, Hoflack B. A high-throughput siRNA screen identifies genes that regulate mannose 6-phosphate receptor trafficking. J Cell Sci. 2014;127:5079–5092. - PubMed
    1. Bamford SP, Nichol RC, Baldry IK, Land K, Lintott CJ, Schawinski K, Slosar AE, Szalay AS, Thomas D, Torki M, et al. Galaxy Zoo: the dependence of morphology and colour on environment? Mon Not R Astron Soc. 2009;393:1324–1352.
    1. Bard F, Chia J. Cracking the glycome encoder: signaling, trafficking, and glycosylation. Trends Cell Biol. 2016;26:379–388. - PubMed
    1. Bishop CM. Pattern recognition. Mach Learn. 2006;128:1–58.
    1. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

LinkOut - more resources