Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 17;22(10):3798.
doi: 10.3390/s22103798.

Evaluating Automatic Body Orientation Detection for Indoor Location from Skeleton Tracking Data to Detect Socially Occupied Spaces Using the Kinect v2, Azure Kinect and Zed 2i

Affiliations

Evaluating Automatic Body Orientation Detection for Indoor Location from Skeleton Tracking Data to Detect Socially Occupied Spaces Using the Kinect v2, Azure Kinect and Zed 2i

Violeta Ana Luz Sosa-León et al. Sensors (Basel). .

Abstract

Analysing the dynamics in social interactions in indoor spaces entails evaluating spatial-temporal variables from the event, such as location and time. Additionally, social interactions include invisible spaces that we unconsciously acknowledge due to social constraints, e.g., space between people having a conversation with each other. Nevertheless, current sensor arrays focus on detecting the physically occupied spaces from social interactions, i.e., areas inhabited by physically measurable objects. Our goal is to detect the socially occupied spaces, i.e., spaces not physically occupied by subjects and objects but inhabited by the interaction they sustain. We evaluate the social representation of the space structure between two or more active participants, so-called F-Formation for small gatherings. We propose calculating body orientation and location from skeleton joint data sets by integrating depth cameras. The body orientation is derived by integrating the shoulders and spine joint data with head/face rotation data and spatial-temporal information from trajectories. From the physically occupied measurements, we can detect socially occupied spaces. In our user study implementing the system, we compared the capabilities and skeleton tracking datasets from three depth camera sensors, the Kinect v2, Azure Kinect, and Zed 2i. We collected 32 walking patterns for individual and dyad configurations and evaluated the system's accuracy regarding the intended and socially accepted orientations. Experimental results show accuracy above 90% for the Kinect v2, 96% for the Azure Kinect, and 89% for the Zed 2i for assessing socially relevant body orientation. Our algorithm contributes to the anonymous and automated assessment of socially occupied spaces. The depth sensor system is promising in detecting more complex social structures. These findings impact research areas that study group interactions within complex indoor settings.

Keywords: Azure Kinect; F-Formation; Kinect v2; RGB-D sensors; Zed 2i; human motion modelling; socially occupied space.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The illustration of the F-Formation model and its three interactional areas are O, P, and R spaces. In (a) group–object interaction. In (b) group–members interaction.
Figure 2
Figure 2
A set of individuals join a third member and construct the interactional space. The position and body orientation establish physically which space is socially occupied. Spatial–temporal variables such as position over time indicate the dynamics of interaction.
Figure 3
Figure 3
Real−world coordinate system with the skeleton extracted from the depth cameras. On (a) the selected skeleton joints are in red, with the positional skeleton joint in blue. On (b) the selected upper skeleton joints used to calculate body orientation.
Figure 4
Figure 4
Skeleton joints map per device. (a) Kinect v2, (b) Azure Kinect and (c) Zed 2i. Greys areas indicate a left-right joint correspondence. Italic joints indicate differences between the devices.
Figure 5
Figure 5
Depth camera sensors with their coordinate system: (a) Kinect v2, (b) Azure Kinect, and (c) Zed 2i. The Zed 2i has six different coordinate systems.
Figure 6
Figure 6
On (a) Methodology to extract body orientation from the skeleton data collected. On (b) illustration of upper joints and head rotation data usage to calculate the body orientation.
Figure 7
Figure 7
Body orientation categories.
Figure 8
Figure 8
Experiment set up for all devices: in (a) the experiment arrangement; in (b) the different walking patterns with the start point and the camera’s position and field of view.
Figure 9
Figure 9
Precision and recall results for body orientation assessment for compiled measurements per device for single configuration in (a) and (b) and dyad configuration in (c) and (d), respectively.
Figure 10
Figure 10
Calculated body orientation angles per sensor; (a,c,e) are highly accurate detected Frontal Diagonal orientation with participant view on the left; (b,d,f) show the detected Side Right orientation with the lowest accuracy for the Kinect v2.
Figure 10
Figure 10
Calculated body orientation angles per sensor; (a,c,e) are highly accurate detected Frontal Diagonal orientation with participant view on the left; (b,d,f) show the detected Side Right orientation with the lowest accuracy for the Kinect v2.
Figure 11
Figure 11
Achieved percentage for acceptable social interaction angle in range per device for single configuration in (a) and dyad configuration in (b).
Figure 12
Figure 12
Precision and recall values for single configuration in (a,b), respectively; precision and recall values for dyad configuration in (c,d), respectively. Both after temporal interpolation.
Figure 12
Figure 12
Precision and recall values for single configuration in (a,b), respectively; precision and recall values for dyad configuration in (c,d), respectively. Both after temporal interpolation.
Figure 13
Figure 13
Temporal interpolation correction for the Back Diagonal Right orientation. In light blue, calculated body orientation angle with outliers. In dark blue, the corrected orientation angle.
Figure 14
Figure 14
Description of the encounter locations with intended body orientations. From left to right: all-frontal, frontal-diagonal, frontal-vis a vis configuration (colour codes correspond to the Body Orientation angle bar in Figure 7).
Figure 15
Figure 15
Group meeting points with frontal-vis a vis orientation per device (a) Kinect v2, (b) Azure Kinect, and (c) Zed 2i. Each colour differentiates a person participating in the group with their corresponding orientation angle. From (df), participants view per device.
Figure 16
Figure 16
F-Formation’s interactional spaces view each body’s shoulder line (in red) with the detected orientation (arrows).

References

    1. Garfinkel H. Studies in Ethnomethodology. Wiley; New York, NY, USA: 1991.
    1. Leon V.A.L.S., Schwering A. Detecting social spaces with depth cameras: Evaluating location and body orientation as relevant social features; Proceedings of the 2021 International Conference on Indoor Positioning and Indoor Navigation (IPIN); Lloret de Mar, Spain. 29 November–2 December 2021; - DOI
    1. Beyan C., Shahid M., Murino V. Investigation of small group social interactions using deep visual activity-based nonverbal features; Proceedings of the 26th ACM international conference on Multimedia; Seoul, Korea. 22–26 October 2018; pp. 311–319. - DOI
    1. Gan T., Wong Y., Zhang D., Kankanhalli M.S. Temporal encoded F-formation system for social interaction detection; Proceedings of the 21st ACM international conference on Multimedia; Barcelona, Spain. 21–25 October 2013; pp. 937–946. - DOI
    1. Kobayashi Y., Yuasa M., Katagami D. Development of an interactive digital signage based on F-formation system; Proceedings of the First International Conference on Human-Agent Interaction (HAI 2013); Sapporo, Japan. 7–9 August 2013.

LinkOut - more resources