Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 11:11:84.
doi: 10.1186/1471-2105-11-84.

Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

Affiliations

Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

Zafer Aydin et al. BMC Bioinformatics. .

Abstract

Background: Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours.

Results: In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net.

Conclusions: We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
C.elegans cell lineaging images. (a) One stack of images taken during development, with the z-axis represented as a red-green-blue color scale. (b) One image, with two channels corresponding to ubiquitous expression (green) and expression via the pha-4 promoter (red). (c) The worm lineage of 959 cells, with branch lengths determined by StarryNite. Red colored branches indicate expression from the pha-4 promoter.
Figure 2
Figure 2
Histogram of various types of errors in one image series. (a) Major error types. (b) Subtypes of tracing errors.
Figure 3
Figure 3
Moving nucleus annotated as dividing. (a) An image plane from the series 081505 at t = 35 and z = 23, where t is the time index and z is the plane index within the image stack. A moving nucleus is encapsulated by a white square box. (b) 3D view of the nuclei present at t = 35. M1 and M2 move from t = 35 to t = 36 and P1 at t = 35 divides into C1 and C2 at t = 36. (c) 3D view of the nuclei present at t = 36. StarryNite annotates M1 at t = 35 as the parent nucleus and links it to M2 and C1 at t = 36, which are incorrectly annotated as the children of M1.
Figure 4
Figure 4
Analysis of individual features. (a) The figure plots ROC curves for the features with the top five AUC scores. Feature names are defined as follows: "Dist P-C1" is the distance from parent to child-1; "Dist C1-C2 at t+1" is the distance between children at t+1; "Cos P-C1, P-C2" is the cosine of the angle between (parent to child-1) and (parent to child-2); "Age C1" is the age of child-1; "Dist P-NN1" is the distance from parent to the nearest neighbor of parent. (b) The AUC associated with each feature. The features are sorted according to the AUC.
Figure 5
Figure 5
ROC curves of the best feature and the SVM. Cross-validated ROC curve produced by the SVM on the development data set and the ROC curve of the best performing single feature ("distance from parent to child-1"). The SVM decision threshold is indicated by an asterisk.
Figure 6
Figure 6
Two feature selection experiments. (a) The figure plots the mean difference in accuracy, across 10 cross-validation splits, of an SVM that uses all features compared to an SVM with some features removed. The number of features eliminated is given on the x-axis. Bars above the y-axis represent SVMs that yield better performance than the baseline SVM, and vice versa. Error bars correspond to standard deviations. (b) This figure is similar to panel (a), except that features are considered in groups, as listed in Table 1. Each blue bar compares the accuracy of the 70-feature SVM to an SVM trained from a single feature group, whereas each red bar compares the full SVM to an SVM trained from all feature groups but one.
Figure 7
Figure 7
Cell division followed by cell movements. Parent nucleus at time = t0 divides into two children nuclei at t0 + 1. Then the children move during the time elapsed between t0 + 1 and t0 + 2.
Figure 8
Figure 8
Computation of the normalized nucleus support features. A nucleus and the square regions on which the normalized nucleus supports are computed. Only 4 planes are shown for simplicity.

Similar articles

Cited by

References

    1. Murray JI, Bao Z, Boyle TJ, Boeck ME, Mericle BL, Nicholas TJ, Zhao Z, Sandel MJ, Waterston RH. Automated analysis of embryonic gene expression with cellular resolution in C. elegans. Nature Methods. 2008;5(8):703–709. doi: 10.1038/nmeth.1228. - DOI - PMC - PubMed
    1. Boyle TJ, Bao Z, Murray JI, Araya CL, Waterston RH. AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis. BMC Bioinformatics. 2006;7:275. doi: 10.1186/1471-2105-7-275. - DOI - PMC - PubMed
    1. Jones TR, Kand IH, Wheeler DB, Lindquist RA, Papallo A, Sabatini DM, Golland P, Carpenter AE. CellProfiler Analyst: data exploration and analysis software for complex image-based screens. BMC Bioinformatics. 2008;9(482) - PMC - PubMed
    1. Chen X, Zhou X, Wong STC. Automated Segmentation, Classification, and Tracking of Cancer Cell Nuclei in Time-Lapse Microscopy. IEEE Transactions on Biomedical Engineering. 2006;53(4):762–766. doi: 10.1109/TBME.2006.870201. - DOI - PubMed
    1. Bao Z, Murray JI, Boyle TJ, Ooi SL, Sandel MJ, Waterston RH. Automated cell lineage tracing in Caenorhabditis elegans. PNAS. 2006;103(8):2707–2712. doi: 10.1073/pnas.0511111103. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances