. 2017 May 5;7(5):1385-1392.

doi: 10.1534/g3.116.033654.

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Tanel Pärnamaa¹, Leopold Parts^{2

3}

Affiliations

¹ Institute of Computer Science, University of Tartu, 50409, Estonia.
² Institute of Computer Science, University of Tartu, 50409, Estonia leopold.parts@sanger.ac.uk.
³ Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

PMID: 28391243
PMCID: PMC5427497
DOI: 10.1534/g3.116.033654

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Tanel Pärnamaa et al. G3 (Bethesda). 2017.

. 2017 May 5;7(5):1385-1392.

doi: 10.1534/g3.116.033654.

Authors

Tanel Pärnamaa¹, Leopold Parts^{2

3}

Affiliations

¹ Institute of Computer Science, University of Tartu, 50409, Estonia.
² Institute of Computer Science, University of Tartu, 50409, Estonia leopold.parts@sanger.ac.uk.
³ Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.

PMID: 28391243
PMCID: PMC5427497
DOI: 10.1534/g3.116.033654

Abstract

High-throughput microscopy of many single cells generates high-dimensional data that are far from straightforward to analyze. One important problem is automatically detecting the cellular compartment where a fluorescently-tagged protein resides, a task relatively simple for an experienced human, but difficult to automate on a computer. Here, we train an 11-layer neural network on data from mapping thousands of yeast proteins, achieving per cell localization classification accuracy of 91%, and per protein accuracy of 99% on held-out images. We confirm that low-level network features correspond to basic image characteristics, while deeper layers separate localization classes. Using this network as a feature calculator, we train standard classifiers that assign proteins to previously unseen compartments after observing only a small number of training examples. Our results are the most accurate subcellular localization classifications to date, and demonstrate the usefulness of deep learning for high-throughput microscopy.

Keywords: deep learning; high-content screening; machine learning; microscopy; yeast.

PubMed Disclaimer

Figures

**Figure 1**
A deep neural network for protein subcellular classification. (A) Outline of the data generation and classification workflow. (B) Example pictures (two images) from each of the 12 classes (labeled above). Red fluorescence corresponds to a cytosolic marker to denote the cell, and green to the protein of interest. (C) Architecture of the “DeepYeast” convolutional neural network. Eight convolutional layers (yellow) are succeeded by three fully connected ones (green), producing the prediction (blue). All convolutional layers have 3 × 3 filters with stride 1 (filter size and number of neurons in layer label), and all pooling operations (purple) are over 2 × 2 nonoverlapping areas. ER, endoplasmic reticulum.

**Figure 2**
Cellular compartment classification accuracy. (A) DeepYeast outperforms random forests in classification precision. Recall (y-axis) for the 12 subcellular compartments (x-axis) for DeepYeast (red) and random forest (blue) classifiers. The dashed lines denote medians across compartments. The error bars denote the 95% C.I. from 20,000 bootstrap samples (Table S2). (B) Same as (A), but for precision on the y-axis. (C) Example classification mistakes stemming from technical issues (left) due to low signal (bottom left) or no cell (top left), population heterogeneity (middle) resulting in false positives (top middle) and false negatives (bottom middle), as well as frequent model errors (right) of classifying nucleus as nucleolus (top right), or nucleolus as spindle pole (bottom right). (D) Confusion matrix of DeepYeast classification. Error rates from the true (y-axis) to falsely predicted (x-axis) compartments. ER, endoplasmic reticulum.

**Figure 3**
Visualization of the network features at different layers. Interpreting the first, second, fourth, eighth, and eleventh layers of DeepYeast (box diagram, top, see also Figure 1C). (A) Image patches that maximize some neuron output. For each of the layers, four neurons (y-axis) and image parts (x-axis) corresponding to a block of pixels that feed into them for maximum activation are shown. (B) 2D visualizations using the t-SNE algorithm (Van der Maaten and Hinton 2008). 1000 random images were fed through the network, hidden layer outputs were extracted, and the t-SNE algorithm was used to project the high-dimensional representations into two dimensions. The points are colored based on the true class categories. (C) Three closest images (x-axis) to two chosen points [1 and 2 in (B), y-axis] in the two-dimensional t-SNE projection space. (D) Distribution of mutual information (y-axis) between the multinomial class probability and discretized neuron outputs for each layer (left to right), as well as CellProfiler features (rightmost box, red).

**Figure 4**
Transfer learning works. (A) Four example images of each of the additional analyzed classes. (B) Applying t-SNE to the network outputs of the additional data (see also Figure 3B) and coloring the points according to the classes demonstrates separation of new compartments based on features trained for classifying other localizations. (C) Classification accuracy on held-out data (y-axis) for different number of training images (x-axis) for DeepYeast outputs (red) or CellProfiler features (blue) used as inputs to a random forest. The error bars denote a 95% C.I. from 20,000 bootstrap samples.

See this image and copyright information in PMC

Cited by

HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units.
Zou K, Wang S, Wang Z, Zhang Z, Yang F. Zou K, et al. Front Mol Biosci. 2023 Aug 17;10:1171429. doi: 10.3389/fmolb.2023.1171429. eCollection 2023. Front Mol Biosci. 2023. PMID: 37664182 Free PMC article.
Self-Learning Microfluidic Platform for Single-Cell Imaging and Classification in Flow.
Constantinou I, Jendrusch M, Aspert T, Görlitz F, Schulze A, Charvin G, Knop M. Constantinou I, et al. Micromachines (Basel). 2019 May 9;10(5):311. doi: 10.3390/mi10050311. Micromachines (Basel). 2019. PMID: 31075890 Free PMC article.
Deep learning unlocks label-free viability assessment of cancer spheroids in microfluidics.
Chiang CC, Anne R, Chawla P, Shaw RM, He S, Rock EC, Zhou M, Cheng J, Gong YN, Chen YC. Chiang CC, et al. Lab Chip. 2024 Jun 11;24(12):3169-3182. doi: 10.1039/d4lc00197d. Lab Chip. 2024. PMID: 38804084 Free PMC article.
Single-cell image analysis to explore cell-to-cell heterogeneity in isogenic populations.
Mattiazzi Usaj M, Yeung CHL, Friesen H, Boone C, Andrews BJ. Mattiazzi Usaj M, et al. Cell Syst. 2021 Jun 16;12(6):608-621. doi: 10.1016/j.cels.2021.05.010. Cell Syst. 2021. PMID: 34139168 Free PMC article. Review.
Transfer learning for versatile and training free high content screening analyses.
Corbe M, Boncompain G, Perez F, Del Nery E, Genovesio A. Corbe M, et al. Sci Rep. 2023 Dec 18;13(1):22599. doi: 10.1038/s41598-023-49554-8. Sci Rep. 2023. PMID: 38114550 Free PMC article.

See all "Cited by" articles

References

1. Albert F. W., Treusch S., Shockley A. H., Bloom J. S., Kruglyak L., 2014. Genetics of single-cell protein abundance variation in large yeast populations. Nature 506(7489): 494–497. - PMC - PubMed
1. Alipanahi B., Delong A., Weirauch M. T., Frey B. J., 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33: 831–838. - PubMed
1. Angermueller C., Parnamaa T., Parts L., Stegle O., 2016. Deep learning for computational biology. Mol. Syst. Biol. 12(7): 878. - PMC - PubMed
1. Boland M. V., Murphy R. F., 2001. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17(12): 1213–1223. - PubMed
1. Boland M. V., Markey M. K., Murphy R. F., 1998. Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images. Cytometry 33(3): 366–375. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Affiliations

Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases