Comparative Study

. 2004 Jun 18:5:78.

doi: 10.1186/1471-2105-5-78.

Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

Kai Huang¹, Robert F Murphy

Affiliations

PMID: 15207009
PMCID: PMC449699
DOI: 10.1186/1471-2105-5-78

Comparative Study

Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

Kai Huang et al. BMC Bioinformatics. 2004.

. 2004 Jun 18:5:78.

doi: 10.1186/1471-2105-5-78.

Authors

Kai Huang¹, Robert F Murphy

Affiliation

¹ Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213 USA. khuang@andrew.cmu.edu

PMID: 15207009
PMCID: PMC449699
DOI: 10.1186/1471-2105-5-78

Abstract

Background: Detailed knowledge of the subcellular location of each expressed protein is critical to a full understanding of its function. Fluorescence microscopy, in combination with methods for fluorescent tagging, is the most suitable current method for proteome-wide determination of subcellular location. Previous work has shown that neural network classifiers can distinguish all major protein subcellular location patterns in both 2D and 3D fluorescence microscope images. Building on these results, we evaluate here new classifiers and features to improve the recognition of protein subcellular location patterns in both 2D and 3D fluorescence microscope images.

Results: We report here a thorough comparison of the performance on this problem of eight different state-of-the-art classification methods, including neural networks, support vector machines with linear, polynomial, radial basis, and exponential radial basis kernel functions, and ensemble methods such as AdaBoost, Bagging, and Mixtures-of-Experts. Ten-fold cross validation was used to evaluate each classifier with various parameters on different Subcellular Location Feature sets representing both 2D and 3D fluorescence microscope images, including new feature sets incorporating features derived from Gabor and Daubechies wavelet transforms. After optimal parameters were chosen for each of the eight classifiers, optimal majority-voting ensemble classifiers were formed for each feature set. Comparison of results for each image for all eight classifiers permits estimation of the lower bound classification error rate for each subcellular pattern, which we interpret to reflect the fraction of cells whose patterns are distorted by mitosis, cell death or acquisition errors. Overall, we obtained statistically significant improvements in classification accuracy over the best previously published results, with the overall error rate being reduced by one-third to one-half and with the average accuracy for single 2D images being higher than 90% for the first time. In particular, the classification accuracy for the easily confused endomembrane compartments (endoplasmic reticulum, Golgi, endosomes, lysosomes) was improved by 5-15%. We achieved further improvements when classification was conducted on image sets rather than on individual cell images.

Conclusions: The availability of accurate, fast, automated classification systems for protein location patterns in conjunction with high throughput fluorescence microscope imaging techniques enables a new subfield of proteomics, location proteomics. The accuracy and sensitivity of this approach represents an important alternative to low-resolution assignments by curation or sequence-based prediction.

PubMed Disclaimer

Figures

**Figure 1**
Representative images of each pattern from correctly classified images using previous neural network classifiers. Ten patterns from the 2D/3D HeLa cell image collection are depicted: endoplasmic reticulum (A/K), giantin (B/L), gpp130 (C/M), LAMP2 (D/N), mitochondria (E/O), nucleolin (F/P), actin (G/Q), transferrin receptor (H/R), tubulin (I/S), and DNA (J/T). Each false color in the 3D images represents the fluorescence intensity from labeling the target protein (green), total DNA (red), and total protein (blue). Projections that are summed upon the Z or Y axis are shown. The feature sets SLF13 (2D) and SLF10 (3D) were used both for classification and for choosing a typical image.

**Figure 2**
Decision boundaries of various classifiers for distinguishing the patterns of two Golgi proteins. A scatterplot of the two most informative features for distinguishing images of giantin (circle) and gpp130 (triangle). These were chosen from SLF7DNA by SDA. Various classifiers were trained using just these features and maps of the class assigned to various points in the feature plane by each trained classifier were created (a zero (white) pixel corresponded to gpp130). A false color map was created by combining the decision maps for exponential-rbf kernel SVM (green), the majority-voting ensemble classifier (red), and neural networks (blue).

**Figure 3**
Dependence of classifier performance on amount of training data. The average performances of neural network (filled circle), SVM (open diamond), AdaBoost (filled triangle), Bagging (filled square), Mixtures-of-Experts (filled diamond), and majority-voting ensemble (open square) classifiers are shown as a function of the amount of training data given to the classifier. Average performance is defined as the average fraction of images in ten (2D) or eleven (3D) classes that were correctly classified over ten cross-validation trials. A) Results for 2D images using feature set SLF13. B) Results for 3D images using feature set SLF10.

**Figure 4**
Dependence of classifier performance on number of input features. The average performances of neural network (filled circle), SVM (open diamond), AdaBoost (filled triangle), Bagging (filled square), Mixtures-of-Experts (filled diamond), and majority-voting ensemble (open square) classifiers are shown as a function of the number of features used to train the classifier. Average performance is defined as the average fraction of images in ten (2D) or eleven (3D) classes that were correctly classified over ten cross-validation trials. The features in SLF7DNA (A) or SLF9 (B) were ranked in order of their ability to discriminate the classes using SDA and increasing numbers of the features were used to train classifiers.

**Figure 5**
Selection of feature subsets including Gabor and Daubechies features. Classifiers were trained using increasing numbers of features from the ranked list selected by SDA from either the 180-feature set including DNA features (filled circle) or the 174-feature set without DNA features (filled diamond) and performance evaluated by 10-fold cross validation. The classifiers used were the optimal majority-voting ensemble classifiers for SLF13 and SLF8 respectively (see Table 4).

**Figure 6**
Example 2D images that were misclassified by the original neural network classifier but could be correctly classified using the best performing ensemble classifier using SLF16. From among the images incorrectly classified by the neural network for each class, the image that was most frequently classified accurately during training of the ensemble classifier was chosen (a random choice was made in the case of ties). ER (A), giantin (B), gpp130 (C), LAMP2 (D), mitochondria (E), nucleolin (F), actin (G), transferrin receptor (H), and tubulin (I). The only DNA image that was misclassified by the original neural network classifier was also missed by the ensemble.

**Figure 7**
Example 2D images that could not be correctly classified by any individual classifier using feature set SLF16. The image that was most frequently classified incorrectly during training of the ensemble classifier was chosen (a random choice was made in the case of ties). ER (A), giantin (B), gpp130 (C), LAMP2 (D), mitochondria (E), and transferrin receptor (F). All images in the other classes could be correctly classified by at least one of the eight classifiers.

**Figure 8**
Dependence of image set classification accuracy on set size and feature set size. Panel A depicts classification accuracy for sets of images drawn from the same class. The accuracy obtained using plurality voting was averaged on 1000 random trials of image sets of various sizes drawn from each class in the test set by using the optimal majority-voting classifier for feature set SLF8 (filled square), SLF15 (filled triangle), SLF13 (filled diamond), SLF16 (filled circle), SLF14 (open square), and SLF10 (open diamond). Panel B depicts classification accuracy for reduced feature subsets using plurality voting. The accuracy obtained using plurality voting was averaged on 1000 random trials of sets of 10 images drawn from each class in the test set for various numbers of features from SLF8 (filled square), SLF15 (filled triangle), SLF13 (filled diamond), SLF16 (filled circle), SLF14 (open square), and SLF10 (open diamond).

See this image and copyright information in PMC

Cited by

Data-mining Techniques for Image-based Plant Phenotypic Traits Identification and Classification.
Rahaman MM, Ahsan MA, Chen M. Rahaman MM, et al. Sci Rep. 2019 Dec 20;9(1):19526. doi: 10.1038/s41598-019-55609-6. Sci Rep. 2019. PMID: 31862925 Free PMC article.
Large-scale automated analysis of location patterns in randomly tagged 3T3 cells.
García Osuna E, Hua J, Bateman NW, Zhao T, Berget PB, Murphy RF. García Osuna E, et al. Ann Biomed Eng. 2007 Jun;35(6):1081-7. doi: 10.1007/s10439-007-9254-5. Epub 2007 Feb 7. Ann Biomed Eng. 2007. PMID: 17285363 Free PMC article.
The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging.
Goldberg IG, Allan C, Burel JM, Creager D, Falconi A, Hochheiser H, Johnston J, Mellen J, Sorger PK, Swedlow JR. Goldberg IG, et al. Genome Biol. 2005;6(5):R47. doi: 10.1186/gb-2005-6-5-r47. Epub 2005 May 3. Genome Biol. 2005. PMID: 15892875 Free PMC article.
Objective clustering of proteins based on subcellular location patterns.
Chen X, Murphy RF. Chen X, et al. J Biomed Biotechnol. 2005 Jun 30;2005(2):87-95. doi: 10.1155/JBB.2005.87. J Biomed Biotechnol. 2005. PMID: 16046813 Free PMC article.
Determining the subcellular location of new proteins from microscope images using local features.
Coelho LP, Kangas JD, Naik AW, Osuna-Highley E, Glory-Afshar E, Fuhrman M, Simha R, Berget PB, Jarvik JW, Murphy RF. Coelho LP, et al. Bioinformatics. 2013 Sep 15;29(18):2343-9. doi: 10.1093/bioinformatics/btt392. Epub 2013 Jul 8. Bioinformatics. 2013. PMID: 23836142 Free PMC article.

See all "Cited by" articles

References

1. Norin M, Sundstrom M. Structural proteomics: developments in structure-to-function predictions. Trends Biotech. 2002;20:79–84. doi: 10.1016/S0167-7799(01)01884-4. - DOI - PubMed
1. Macbeath G. Protein microarrays and proteomics. Nature Genetics. 2002;32:526–532. doi: 10.1038/ng1037. - DOI - PubMed
1. Nakai K. Protein sorting signals and prediction of subcellular localization. Adv Protein Chem. 2000;54:277–344. doi: 10.1016/S0065-3233(00)54009-1. - DOI - PubMed
1. Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics. 2001;17:721–728. doi: 10.1093/bioinformatics/17.8.721. - DOI - PubMed
1. von Heijne G, Nielsen H, Engelbrecht J, Brunak S. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997;10:1–6. doi: 10.1093/protein/10.1.1. - DOI - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

Affiliation

Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources