Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 5;7(5):1385-1392.
doi: 10.1534/g3.116.033654.

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Affiliations

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Tanel Pärnamaa et al. G3 (Bethesda). .

Abstract

High-throughput microscopy of many single cells generates high-dimensional data that are far from straightforward to analyze. One important problem is automatically detecting the cellular compartment where a fluorescently-tagged protein resides, a task relatively simple for an experienced human, but difficult to automate on a computer. Here, we train an 11-layer neural network on data from mapping thousands of yeast proteins, achieving per cell localization classification accuracy of 91%, and per protein accuracy of 99% on held-out images. We confirm that low-level network features correspond to basic image characteristics, while deeper layers separate localization classes. Using this network as a feature calculator, we train standard classifiers that assign proteins to previously unseen compartments after observing only a small number of training examples. Our results are the most accurate subcellular localization classifications to date, and demonstrate the usefulness of deep learning for high-throughput microscopy.

Keywords: deep learning; high-content screening; machine learning; microscopy; yeast.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A deep neural network for protein subcellular classification. (A) Outline of the data generation and classification workflow. (B) Example pictures (two images) from each of the 12 classes (labeled above). Red fluorescence corresponds to a cytosolic marker to denote the cell, and green to the protein of interest. (C) Architecture of the “DeepYeast” convolutional neural network. Eight convolutional layers (yellow) are succeeded by three fully connected ones (green), producing the prediction (blue). All convolutional layers have 3 × 3 filters with stride 1 (filter size and number of neurons in layer label), and all pooling operations (purple) are over 2 × 2 nonoverlapping areas. ER, endoplasmic reticulum.
Figure 2
Figure 2
Cellular compartment classification accuracy. (A) DeepYeast outperforms random forests in classification precision. Recall (y-axis) for the 12 subcellular compartments (x-axis) for DeepYeast (red) and random forest (blue) classifiers. The dashed lines denote medians across compartments. The error bars denote the 95% C.I. from 20,000 bootstrap samples (Table S2). (B) Same as (A), but for precision on the y-axis. (C) Example classification mistakes stemming from technical issues (left) due to low signal (bottom left) or no cell (top left), population heterogeneity (middle) resulting in false positives (top middle) and false negatives (bottom middle), as well as frequent model errors (right) of classifying nucleus as nucleolus (top right), or nucleolus as spindle pole (bottom right). (D) Confusion matrix of DeepYeast classification. Error rates from the true (y-axis) to falsely predicted (x-axis) compartments. ER, endoplasmic reticulum.
Figure 3
Figure 3
Visualization of the network features at different layers. Interpreting the first, second, fourth, eighth, and eleventh layers of DeepYeast (box diagram, top, see also Figure 1C). (A) Image patches that maximize some neuron output. For each of the layers, four neurons (y-axis) and image parts (x-axis) corresponding to a block of pixels that feed into them for maximum activation are shown. (B) 2D visualizations using the t-SNE algorithm (Van der Maaten and Hinton 2008). 1000 random images were fed through the network, hidden layer outputs were extracted, and the t-SNE algorithm was used to project the high-dimensional representations into two dimensions. The points are colored based on the true class categories. (C) Three closest images (x-axis) to two chosen points [1 and 2 in (B), y-axis] in the two-dimensional t-SNE projection space. (D) Distribution of mutual information (y-axis) between the multinomial class probability and discretized neuron outputs for each layer (left to right), as well as CellProfiler features (rightmost box, red).
Figure 4
Figure 4
Transfer learning works. (A) Four example images of each of the additional analyzed classes. (B) Applying t-SNE to the network outputs of the additional data (see also Figure 3B) and coloring the points according to the classes demonstrates separation of new compartments based on features trained for classifying other localizations. (C) Classification accuracy on held-out data (y-axis) for different number of training images (x-axis) for DeepYeast outputs (red) or CellProfiler features (blue) used as inputs to a random forest. The error bars denote a 95% C.I. from 20,000 bootstrap samples.

Similar articles

Cited by

References

    1. Albert F. W., Treusch S., Shockley A. H., Bloom J. S., Kruglyak L., 2014. Genetics of single-cell protein abundance variation in large yeast populations. Nature 506(7489): 494–497. - PMC - PubMed
    1. Alipanahi B., Delong A., Weirauch M. T., Frey B. J., 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33: 831–838. - PubMed
    1. Angermueller C., Parnamaa T., Parts L., Stegle O., 2016. Deep learning for computational biology. Mol. Syst. Biol. 12(7): 878. - PMC - PubMed
    1. Boland M. V., Murphy R. F., 2001. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12): 1213–1223. - PubMed
    1. Boland M. V., Markey M. K., Murphy R. F., 1998. Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images. Cytometry 33(3): 366–375. - PubMed

Publication types

MeSH terms

LinkOut - more resources