Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan;113(1):80-89.
doi: 10.1016/j.cviu.2008.07.006.

Linguistic Summarization of Video for Fall Detection Using Voxel Person and Fuzzy Logic

Affiliations

Linguistic Summarization of Video for Fall Detection Using Voxel Person and Fuzzy Logic

Derek Anderson et al. Comput Vis Image Underst. 2009 Jan.

Abstract

In this paper, we present a method for recognizing human activity from linguistic summarizations of temporal fuzzy inference curves representing the states of a three-dimensional object called voxel person. A hierarchy of fuzzy logic is used, where the output from each level is summarized and fed into the next level. We present a two level model for fall detection. The first level infers the states of the person at each image. The second level operates on linguistic summarizations of voxel person's states and inference regarding activity is performed. The rules used for fall detection were designed under the supervision of nurses to ensure that they reflect the manner in which elders perform these activities. The proposed framework is extremely flexible. Rules can be modified, added, or removed, allowing for per-resident customization based on knowledge about their cognitive and physical ability.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Voxel person construction. Cameras capture the raw video from different viewpoints, silhouette extraction is performed for each camera, voxel sets are calculated from the silhouettes for each camera, and the voxel sets are intersected to calculate voxel person.
Fig. 2
Fig. 2
Fuzzy inference outputs plotted for a voxel person fall. The x-axis is time, measured in frames, and the y-axis is the fuzzy inference outputs. The red curve is upright, the blue curve is in-between, and the green curve is on-the-ground. The frame rate was 3 per second, so the above plot is approximately 23 seconds of activity.
Fig. 3
Fig. 3
Color-coding of voxel person according to the membership output values. Voxel persons color is a mixture of the fuzzy rule system outputs. The upright state determines the amount of red, in-between is green, and on-the-ground is blue.
Fig. 4
Fig. 4
Activity recognition framework, which utilizes a hierarchy of fuzzy logic based on voxel person representation. The first level is reasoning about the state of the individual. Linguistic summarizations are produced and fuzzy logic is used again to reason about human activity.
Fig. 5
Fig. 5
Detection of a large recent change in voxel person’s speed. (a) Motion vector magnitudes are computed, (b) a fixed size window, placed directly before the start of the summarization, is smoothed with a mean filter, and (c) the maximum of the derivative of the filtered motion vector magnitudes is found in the first and second halves of the window. The feature is the ratio of the two maximum values.
Fig. 6
Fig. 6
Example images and their corresponding silhouettes from the fall data set. Lying on the couch and sitting on the chair with feet up activities, which could be misinterpreted as a fall, are not recognized as a fall in our system, an advantage of rule-based reasoning and knowledge about three-dimensional voxel person.
Fig. 7
Fig. 7
Approximately 11 minutes of video analysis, 2,042 frames total. A total of 4 falls occurred and 38 linguistic summarizations were produced. The upright membership is shown in red, in-between membership is shown in blue, and on-the-ground is shown in green. Dashed vertical purple lines are the manually inserted moments where a fall occurred.
Fig. 8
Fig. 8
Fifty-eight frames (approximately 19 seconds) from a sequence where the person fell and was able to get back up. Red is upright, blue is in-between, and green is on-the-ground.
Fig. 9
Fig. 9
Sixty-three frames (approximately 21 seconds) where the person fell and tried to get back up three times. Red is upright, blue is in-between, and green is on-the-ground.

References

    1. Stauffer C, Grimson WEL. Learning patterns of activity using real-time tracking. IEEE Trans on Pattern Analysis and Machine Intelligence. 2000;22:747–757.
    1. Oliver NM, Rosario B, Pentland AP. A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Tans on Pattern Analysis and Machine Intelligence. 2000;22:831–843.
    1. Toyama K, Krumm J, Brumitt B, Meyers B. Wallflower: principles and practice of background maintenance. Proceedings of the Seventh IEEE International Conference on Computer Vision. 1999;1:255–261.
    1. Luke RH, Anderson D, Keller JM, Skubic M. Moving Object Segmentation from Video Using Fused Color and Texture Features in Indoor Environments. Journal of Real-Time Image Processing. 2008
    1. Parag T, Elgammal A, Mittal A. A framework for feature selection for background subtraction. IEEE Computer Society Conf on Computer Vision and Pattern Recognition. 2006;2:1916–1923.

LinkOut - more resources