. 2009 Feb 10;106(6):1826-31.

doi: 10.1073/pnas.0808843106. Epub 2009 Feb 2.

Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning

Thouis R Jones¹, Anne E Carpenter, Michael R Lamprecht, Jason Moffat, Serena J Silver, Jennifer K Grenier, Adam B Castoreno, Ulrike S Eggert, David E Root, Polina Golland, David M Sabatini

Affiliations

PMID: 19188593
PMCID: PMC2634799
DOI: 10.1073/pnas.0808843106

Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning

Thouis R Jones et al. Proc Natl Acad Sci U S A. 2009.

. 2009 Feb 10;106(6):1826-31.

doi: 10.1073/pnas.0808843106. Epub 2009 Feb 2.

Authors

Thouis R Jones¹, Anne E Carpenter, Michael R Lamprecht, Jason Moffat, Serena J Silver, Jennifer K Grenier, Adam B Castoreno, Ulrike S Eggert, David E Root, Polina Golland, David M Sabatini

Affiliation

¹ The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA.

PMID: 19188593
PMCID: PMC2634799
DOI: 10.1073/pnas.0808843106

Abstract

Many biological pathways were first uncovered by identifying mutants with visible phenotypes and by scoring every sample in a screen via tedious and subjective visual inspection. Now, automated image analysis can effectively score many phenotypes. In practical application, customizing an image-analysis algorithm or finding a sufficient number of example cells to train a machine learning algorithm can be infeasible, particularly when positive control samples are not available and the phenotype of interest is rare. Here we present a supervised machine learning approach that uses iterative feedback to readily score multiple subtle and complex morphological phenotypes in high-throughput, image-based screens. First, automated cytological profiling extracts hundreds of numerical descriptors for every cell in every image. Next, the researcher generates a rule (i.e., classifier) to recognize cells with a phenotype of interest during a short, interactive training session using iterative feedback. Finally, all of the cells in the experiment are automatically classified and each sample is scored based on the presence of cells displaying the phenotype. By using this approach, we successfully scored images in RNA interference screens in 2 organisms for the prevalence of 15 diverse cellular morphologies, some of which were previously intractable.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Scoring cell morphologies via cytological profiling, iterative feedback, and machine learning. (A) Images of cell populations for each treatment condition (RNAi or chemical) are processed with cell-image analysis software (e.g., CellProfiler) to identify and measure individual cells, in order to generate a cytological profile, containing a collection of measurements of features of each cell, represented schematically here as a bar code. (B) The software system presents the researcher with individual cells for classification, sampled randomly from the screen-wide population. After a few dozen cells are classified, the researcher can begin the iterative machine learning phase, in which the computer generates a tentative rule based on the classified cells and presents the researcher with cells classified according to that rule. In general, larger training sets produce more accurate rules, and using too small a training set can result in the computer training to a too-narrow definition of the phenotype (Fig. S10). Generating a large training set without iterative feedback can be difficult when the phenotype is rare or no positive control samples are available; these are the cases where the iterative nature of our approach is most useful. The optimal initial training set size depends on the complexity of the phenotype and the scarcity of positive cells in the experiment. After the researcher corrects errors and retrains for several rounds, the rule becomes more accurate. (C) When the accuracy of the rule is sufficient, it is used to classify all cells in the experiment in order to calculate the number of positive and negative cells in each sample.

**Fig. 2.**
Validation example of actin blebs phenotype. (i) The approach rank-orders samples (populations of cells under the same treatment condition) by their enrichment score (see *Methods*) and allows selection of positive and neutral samples based on this automated scoring. (ii) The corresponding phenotype penetrance is shown for the positive and neutral samples. Phenotype penetrance is typically correlated with enrichment score except that a low number of cells in a sample can decrease the score despite a high penetrance. (*iii*) The corresponding validation data are shown for the positive and neutral samples. The height of the bar for each sample indicates how many times a human observer chose that sample as a positive in a forced-choice comparison (see *Methods*). In this example, samples that were scored as positives (*Left*) were also chosen by the researchers as positives (11 or 12 times, of 12 comparisons per sample), and none of the neutral samples (*Right*) were routinely chosen as positive (0 or 1 time of 12 comparisons). Corresponding data for all phenotypes is shown in Figs. 3 and 4.

**Fig. 3.**
Results of the phenotype-scoring system, for diverse cellular morphologies in human cells. Each row shows images and data for a different cellular morphology that the system was trained to recognize and score. The phenotype column shows the name of each phenotype along with the number of positive and negative example cells in the training set after all rounds of iteration were completed by the researcher. Images for each phenotype follow a color scheme: blue, DNA (contrast-stretched); red, actin (contrast-stretched); green, phospho-histone H3 (absolute scale). (*Left*) Traditional pseudocoloring of the fluorescence microscopy images. (*Right*) Color-adjustment using the “Invert For Printing” module of CellProfiler. The width of each image (or montage, for multiframe images) is 102 μm. For details on the validation column, see Fig. 2. The penetrance histogram column shows the distribution of per-sample penetrance for each phenotype, along with the mean (shown as text and with a green line) and the model fit to the data (red line).

**Fig. 4.**
More results of the phenotype-scoring system, for diverse cellular morphologies in human cells. See Fig. 3 for details.

See this image and copyright information in PMC

References

1. Nusslein-Volhard C, Wieschaus E. Mutations affecting segment number and polarity in Drosophila. Nature. 1980;287:795–801. - PubMed
1. Morgan TH. The origin of five mutations in eye color in Drosophila and their modes of inheritance. Science. 1911;33:534–537. - PubMed
1. Muller H. Artificial Transmutation of the Gene. Science. 1927;66:84–87. - PubMed
1. Hartwell LH, Culotti J, Reid B. Genetic control of the cell-division cycle in yeast. I. Detection of mutants. Proc Natl Acad Sci USA. 1970;66:352–359. - PMC - PubMed
1. Brenner S. The genetics of Caenorhabditis elegans. Genetics. 1974;77:71–94. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 GM0725555/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning

Affiliation

Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources