Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 14;19(20):4441.
doi: 10.3390/s19204441.

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning

Affiliations

Hands-Free User Interface for AR/VR Devices Exploiting Wearer's Facial Gestures Using Unsupervised Deep Learning

Jaekwang Cha et al. Sensors (Basel). .

Abstract

Developing a user interface (UI) suitable for headset environments is one of the challenges in the field of augmented reality (AR) technologies. This study proposes a hands-free UI for an AR headset that exploits facial gestures of the wearer to recognize user intentions. The facial gestures of the headset wearer are detected by a custom-designed sensor that detects skin deformation based on infrared diffusion characteristics of human skin. We designed a deep neural network classifier to determine the user's intended gestures from skin-deformation data, which are exploited as user input commands for the proposed UI system. The proposed classifier is composed of a spatiotemporal autoencoder and deep embedded clustering algorithm, trained in an unsupervised manner. The UI device was embedded in a commercial AR headset, and several experiments were performed on the online sensor data to verify operation of the device. We achieved implementation of a hands-free UI for an AR headset with average accuracy of 95.4% user-command recognition, as determined through tests by participants.

Keywords: augmented reality; deep embedded clustering; hands-free interface; spatiotemporal autoencoder.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Example of common interaction method with a user wearing an augmented reality (AR) headset: (a) hand-held controller and (b) button or touchpad.
Figure 2
Figure 2
Overall system diagram: The sensor module includes an IR LD and an IR camera. The IR camera takes images of IR diffusion patterns, and the LD emits IR light onto the skin.
Figure 3
Figure 3
Implementation of the sensor module. (a) The sensor module includes a USB camera and NIR laser diode. We installed the module on the left side of the Epson BT-350 AR glasses. (b) A photograph of the headset on a user. The laser diode is aimed on the skin near the left cheek, which is deformed when a user makes winking gestures.
Figure 4
Figure 4
Images of IR laterally propagated through the skin: (a) IR SRDR image captured with no facial gesture and (b) captured IR SRDR image during a wink gesture by the user. The brightness of the white region in these images represents the intensity of the IR SRDR.
Figure 5
Figure 5
Preprocessing procedure of the clustering network: The preprocessing unit calculates the difference between two images. For the calculation, the unit thresholds the input images, conducts pixel-wise subtraction between the images, and resizes the images to 28 × 28 pixels to suit the clustering network input size.
Figure 6
Figure 6
Structure of the network extracting sensor data features: spatiotemporal autoencoder (STAE) consists of an encoder and a decoder. After the training of STAE, only the encoder part is utilized as the feature extractor.
Figure 7
Figure 7
Detailed configurations of the STAE used for feature extraction.
Figure 8
Figure 8
Proposed classifier network consisting of a STAE-based feature extractor and a deep embedded clustering (DEC)-based feature classifier.
Figure 9
Figure 9
Clustering results by applying proposed DEC method: (a) clustering results for 81,758 training dataset images and (b) clustering results for real-time sensing from users (online validation).
Figure 10
Figure 10
Screenshots from the demonstration using a custom-made application. A user could pop balloons (a,b) or select buttons to change the background (c,d). A user could select an object by aiming (targeting) a red center dot on a specific object and then execute using a winking gesture.

References

    1. Farrell T.J., Patterson M.S., Wilson B. A diffusion theory model of spatially resolved, steady-state diffuse reflectance for the noninvasive determination of tissue optical properties in vivo. Med. Phys. 1992;19:879–888. doi: 10.1118/1.596777. - DOI - PubMed
    1. Kienle A., Wetzel C., Bassi A.L., Comelli D., Taroni P., Pifferi A. Determination of the optical properties of anisotropic biological media using an isotropic diffusion model. J. Biomed. Opt. 2007;12:014026. doi: 10.1117/1.2709864. - DOI - PubMed
    1. Kienle A., D’Andrea C., Foschum F., Taroni P., Pifferi A. Light propagation in dry and wet softwood. Opt. Express. 2008;16:9895–9906. doi: 10.1364/OE.16.009895. - DOI - PubMed
    1. Nickell S., Hermann M., Essenpreis M., Farrell T.J., Krämer U., Patterson M.S. Anisotropy of light propagation in human skin. Phys. Med. Biol. 2000;45:2873–2886. doi: 10.1088/0031-9155/45/10/310. - DOI - PubMed
    1. Cha J., Kim J., Kim S. Noninvasive determination of fiber orientation and tracking 2-dimensional deformation of human skin utilizing spatially resolved reflectance of infrared light measurement in vivo. Measurement. 2019;142:170–180. doi: 10.1016/j.measurement.2019.04.065. - DOI