Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;3(11):e3807.
doi: 10.1371/journal.pone.0003807. Epub 2008 Nov 27.

Object segmentation from motion discontinuities and temporal occlusions--a biologically inspired model

Affiliations

Object segmentation from motion discontinuities and temporal occlusions--a biologically inspired model

Cornelia Beck et al. PLoS One. 2008.

Abstract

Background: Optic flow is an important cue for object detection. Humans are able to perceive objects in a scene using only kinetic boundaries, and can perform the task even when other shape cues are not provided. These kinetic boundaries are characterized by the presence of motion discontinuities in a local neighbourhood. In addition, temporal occlusions appear along the boundaries as the object in front covers the background and the objects that are spatially behind it.

Methodology/principal findings: From a technical point of view, the detection of motion boundaries for segmentation based on optic flow is a difficult task. This is due to the problem that flow detected along such boundaries is generally not reliable. We propose a model derived from mechanisms found in visual areas V1, MT, and MSTl of human and primate cortex that achieves robust detection along motion boundaries. It includes two separate mechanisms for both the detection of motion discontinuities and of occlusion regions based on how neurons respond to spatial and temporal contrast, respectively. The mechanisms are embedded in a biologically inspired architecture that integrates information of different model components of the visual processing due to feedback connections. In particular, mutual interactions between the detection of motion discontinuities and temporal occlusions allow a considerable improvement of the kinetic boundary detection.

Conclusions/significance: A new model is proposed that uses optic flow cues to detect motion discontinuities and object occlusion. We suggest that by combining these results for motion discontinuities and object occlusion, object segmentation within the model can be improved. This idea could also be applied in other models for object segmentation. In addition, we discuss how this model is related to neurophysiological findings. The model was successfully tested both with artificial and real sequences including self and object motion.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. 3D scenario with two objects.
This figure depicts a typical scenario for a person moving in a room. A static object (green) and a moving object (blue) are located in the room in front of the background. On the left, static occlusion regions with respect to the observer perspective are marked with gray overlay. Due to the spatial configuration the green object is partly covering the blue one, both objects are occluding the background texture. When the observer is moving forward, an expansional flow field is generated that is partly superimposed by the translational movement of the blue object. The optic flow, i.e. the projection of the 3D flow is shown on the projection plane. The alignment of the objects in the 2D projection is shown on the right. Here, also the kinetic occlusions generated by the movement of the blue object are depicted. On its left side, background texture is uncovered (disocclusion), on the right side it is temporarily covered (occlusion). Note, that the expansional flow leads to further kinetic occlusion regions along the outline of both objects, for simplicity this is not included in the sketch.
Figure 2
Figure 2. Sketch of the biologically inspired model.
V1Model Motion and MTModel Motion represent the basic modules for optic flow estimation. In TOModel regions that have been occluded or disoccluded are estimated. In MSTlModel motion discontinuities are computed based on MTModel input due to spatial on-center-off-surround receptive fields. The information of areas MSTlModel, TOModel, and V2Model is combined in a higher level processing area (HLPModel). Feedforward connections are depicted with dark blue arrows, feedback connections with light blue arrows. The interactions between MSTlModel and TOModel are depicted with green arrows.
Figure 3
Figure 3. Optic flow estimation at occlusions.
Occlusions lead to problems for motion estimation algorithms based on the correlation between only two frames: Parts of the image are only visible in one of the frames, thus no corresponding image positions can be found at these locations. This problem can be solved using only one additional temporally forward-looking step (future step).
Figure 4
Figure 4. Detection of motion discontinuities.
Some examples for motion discontinuities are given on the left bottom. We use a motion discontinuity detector built of an on-center-off-surround RF that will respond very strongly if center and surround motion differ. If a homogeneous flow field is presented, only a weak response is produced.
Figure 5
Figure 5. Detection of occlusion regions.
To detect occlusions and disocclusions in the motion sequence, we compare the motion energy at each spatial position that was estimated using the past frame pair t−1/t0 and using the future frame pair t0/t1. A high difference typically occurs at occlusion and disocclusion positions due to regions that are only visible in t−1 or t1 and thus entail very ambiguous motion estimates.
Figure 6
Figure 6. Overview of mechanisms for scene interpretation.
Top row: The optic flow of the input image is computed in V1Model and MTModel, spatial contrast neurons in MSTlModel compute the motion discontinuities. Based on the detected motion boundaries a simple filling-in mechanism provides a scene segmentation. Bottom row: In TOModel input from V1Model neurons is used for a temporal on-center-off-surround processing step to detect occlusion and disocclusion regions. In HLPModel these regions are restricted to the motion discontinuities or luminance contours provided from V2Model to find the corresponding object that is adjacent to the occlusion region, namely the occluder. The results of the object segmentation are used to find the label of the corresponding object (indicated by the arrow from the top row, third column). Based on these data, the corresponding depth order can be computed. Interactions between MSTlModel and TOModel are not depicted in this figure.
Figure 7
Figure 7. Experiment 1: Flowergarden sequence.
A) Input image. B) Optic flow estimated in area MTModel, direction is indicated by a color code, speed by the corresponding saturation. C) Motion discontinuities appear due to the faster optic flow on the tree and along the regions where no movement is indicated as for the sky. D) TOModel responds strongly along the contours of the tree trunk as during the translational self-motion the trunk occludes parts of the background (white color indicates disocclusion areas, black color occlusion areas). The results shown here include feedback from MSTlModel neurons.
Figure 8
Figure 8. Experiment 2: Moving boxes.
Results for an input sequence with 5 boxes and the background all moving in different directions. A) Input image with arrows indicating the movement of the objects. The background is slowly moving to the left. B) Mean optic flow estimations in area MTModel marked with a color code that is superimposed on the input image. In C) the detected occlusion (black) and disocclusion (white) regions are shown. Note that depending on the direction of the object movement these regions appear all along the object boundaries or just on two sides (for a movement in vertical or horizontal direction). D) Contours of the objects as provided by V2Model Form. This activity is used to achieve a clear localization of the occlusion boundary to the corresponding occluder. E) A clear segmentation of the object boundaries is achieved using the motion discontinuities detected with MSTlModel on-center-off-surround neurons. F) After the detected boundaries have been grouped and filled, the image is segmented in different regions representing the objects of the scene. G) Classification of object movement. The difference of object and background motion is computed as explained in the Methods section. Light object boundaries indicate a strong difference, darker outlines represent a movement similar to the background. Note, that object 5 and 2 have a strong motion contrast to the background despite the similar movement direction due to a much higher speed than the background. H) The results of the relative depth order derived automatically from the scene. A confidence value is applied to get a probability for the correctness of the depth order (indicated in percent). This is derived from the number of positions belonging to the object that indicate that the object is in front (#posfront) and the number of positions that indicate that the object is in the background (#posbg) (conf = max(#posfront, #posbg)/(#posfront+#posbg).
Figure 9
Figure 9. Experiment 3: Independently moving object in a scene with a moving observer.
A) Input image of the sequence (generated in the XVR environment, download at www.vrmedia.it), the gray arrow indicates the movement of the independently moving object. B) The optic flow in area MTModel is depicted, the object movement is correctly indicating a translation to the right. C) Occlusions and disocclusions are correctly detected on the right and left side of the object, respectively. The result shown here include feedback from MSTlModel. D) Motion discontinuities as computed by MSTlModel on-center-off-surround neurons show the object boundary, E) after the grouping and filling-in step the object can be segmented.
Figure 10
Figure 10. Experiment 4: City view through a window.
Artificially generated scene with a background moving to the left while the aperture is fixed. A) One image of the input sequence. B) The mean optic flow as detected in MTModel. C) The movement generates occlusions on the left (black positions) and disocclusions on the right side (white positions). D) The motion discontinuities show the complete object boundary. E) After segmentation two objects are detected depicted in different colors, the aperture (gray) and the region within the window (white). F) The corresponding occluder to the occlusion positions with respect to the objects segmented like shown in E), the colors indicate the assignment. Most positions correctly indicate the aperture as the object causing the occlusion.
Figure 11
Figure 11. Experiment 5: Rotating rectangle.
A bar is rotating around its center in front of a stationary background. A) Input image of the sequence. B) The motion estimates of area MTModel, C) Discclusion regions appear on the upper left and the lower right, in contrast occlusions are found at the lower left and the upper right, this diagonal appearance is due to the rotational movement of the object. The result indicated here is without feedback from motion discontinuities. D) The motion boundary is correctly detected using the motion discontinuities, however, also in the object center MSTlModel neurons respond strongly when the movement switches from zero movement to the smallest movement that can be detected with the model. E) When including the interaction between occlusion and motion discontinuity detection, the erroneously detected central part is erased. F) Occlusion regions are correctly restricted due to feedback from motion discontinuity neurons as shown in D. The feedback is slightly blurred as occlusion regions may be significantly bigger than motion discontinuities.
Figure 12
Figure 12. Experiment 6: Detection of moving objects in a real sequence.
A) Input image of the sequence representing two objects moving in opposite directions and a translational camera movement upwards. B) Mean optic flow estimated in area MTModel, the direction of movement is depicted with the color code shown in the top right corner. C) In movement direction of the objects the dark region represents the occlusions detected, behind the objects white positions indicate the disoccluded region. Due to higher object speed the regions here are bigger than in the other experiments. According to the noise included in the scene, the estimates also get noisier, but still the overall response reflects the correct occlusion and disocclusion regions. D) The motion discontinuities including temporal integration (three frames used) clearly indicate the object boundary, E) after grouping the scene is segmented into background (black) and the two objects (gray and white). The motion discontinuities in D) in the upper left and the lower right part are not according to the results of the detected kinetic occlusion. The results in E) after the interaction with TOModel thus correctly indicate only 2 objects. F) Comparison of motion discontinuity results without (left column) and with (right column) temporal integration. Without temporal integration the quality of the motion discontinuities is reduced: For expample, the gap in the smaller object at the lower left corner can only be closed using the temporal integration (first row, position indicated in light blue in D). Also the outline of the other object becomes straighter (second row, position indicated in red in D).

References

    1. Ke Q, Kanade T. A Subspace approach to Layer Extraction. Proc Computer Vision and Pattern Recognition. 2001:255–262.
    1. Ogale AS, Fermüller C, Aloimonos Y. Motion Segmentation Using Occlusions. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27:988–992. - PubMed
    1. Weiss Y, Adelson EH. A Unified Mixture Framework for Motion Segmentation: Incorporating Spatial Coherence and Estimating the Number of Models. Proc Computer Vision and Pattern Recognition. 1996:321–326.
    1. Niyogi SA. Detecting kinetic occlusion. In Proc ICCV, IEEE Computer Society Press. 1995:1044–1049.
    1. Black MJ, Fleet DJ. Probabilistic Detection and Tracking of Motion Boundaries. Int'l J Computer Vision. 2000;38:231–245.

Publication types