Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 25;24(9):2740.
doi: 10.3390/s24092740.

End-to-End Ultrasonic Hand Gesture Recognition

Affiliations

End-to-End Ultrasonic Hand Gesture Recognition

Elfi Fertl et al. Sensors (Basel). .

Abstract

As the number of electronic gadgets in our daily lives is increasing and most of them require some kind of human interaction, this demands innovative, convenient input methods. There are limitations to state-of-the-art (SotA) ultrasound-based hand gesture recognition (HGR) systems in terms of robustness and accuracy. This research presents a novel machine learning (ML)-based end-to-end solution for hand gesture recognition with low-cost micro-electromechanical (MEMS) system ultrasonic transducers. In contrast to prior methods, our ML model processes the raw echo samples directly instead of using pre-processed data. Consequently, the processing flow presented in this work leaves it to the ML model to extract the important information from the echo data. The success of this approach is demonstrated as follows. Four MEMS ultrasonic transducers are placed in three different geometrical arrangements. For each arrangement, different types of ML models are optimized and benchmarked on datasets acquired with the presented custom hardware (HW): convolutional neural networks (CNNs), gated recurrent units (GRUs), long short-term memory (LSTM), vision transformer (ViT), and cross-attention multi-scale vision transformer (CrossViT). The three last-mentioned ML models reached more than 88% accuracy. The most important innovation described in this research paper is that we were able to demonstrate that little pre-processing is necessary to obtain high accuracy in ultrasonic HGR for several arrangements of cost-effective and low-power MEMS ultrasonic transducer arrays. Even the computationally intensive Fourier transform can be omitted. The presented approach is further compared to HGR systems using other sensor types such as vision, WiFi, radar, and state-of-the-art ultrasound-based HGR systems. Direct processing of the sensor signals by a compact model makes ultrasonic hand gesture recognition a true low-cost and power-efficient input method.

Keywords: Fourier transform; HMI; MEMS ultrasonic transducer; machine learning; pre-processing.

PubMed Disclaimer

Conflict of interest statement

Authors Elfi Fertl, Do Dinh Tan Nguyen, Martin Krueger, and Georg Stettinger were employed by the company Infineon Technologies AG. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Sensor-agnostic HGR processing flow.
Figure 2
Figure 2
Suggested change in HGR processing flow.
Figure 3
Figure 3
Data acquisition setup.
Figure 4
Figure 4
Processing shield with linear array; transducer locations are marked with yellow and white circles; yellow circle—sending transducer and white circle—receiving transducer.
Figure 5
Figure 5
Corner (left) and center (right) transducer array without processing shield; transducer locations marked with yellow and white circles; yellow circle—sending transducer and white circle—receiving transducer.
Figure 6
Figure 6
Gestures used in the dataset.
Figure 7
Figure 7
Part of a gesture frame of one channel of a pp gesture. Find an illustration of the pp gesture in Figure 6.
Figure 8
Figure 8
Plot of a pulse train with echo (in blue) compared to a pulse train without echo (in yellow) of transducer 1.
Figure 9
Figure 9
Plot of a pulse train with echo (in blue) compared to a pulse train without echo (in yellow) of transducer 2.
Figure 10
Figure 10
Plot of a pulse train with echo (in blue) compared to a pulse train without echo (in yellow) of transducer 3.
Figure 11
Figure 11
Schema of the CNN model with the best results. When the input is two channels (linear dataset), the output is 1 of 4 possible gestures. When the input is three channels (center and corner dataset), 1 out of 6 gestures is the output.
Figure 12
Figure 12
Accuracies per model per dataset.
Figure 13
Figure 13
Sizes of best models.
Figure 14
Figure 14
Best accuracy of the best models on all classes compared to the accuracy of the best models on four classes and the average accuracy of all models on each of the three datasets.
Figure 15
Figure 15
Accuracies per model averaged over datasets.

References

    1. Kirimtat A., Krejcar O., Kertesz A., Tasgetiren M.F. Future Trends and Current State of Smart City Concepts: A Survey. IEEE Access. 2020;8:86448–86467. doi: 10.1109/ACCESS.2020.2992441. - DOI
    1. Hamad A., Jia B. How Virtual Reality Technology Has Changed Our Lives: An Overview of the Current and Potential Applications and Limitations. Int. J. Environ. Res. Public Health. 2022;19:11278. doi: 10.3390/ijerph191811278. - DOI - PMC - PubMed
    1. Fu J., Rota A., Li S., Zhao J., Liu Q., Iovene E., Ferrigno G., De Momi E. Recent Advancements in Augmented Reality for Robotic Applications: A Survey. Actuators. 2023;12:323. doi: 10.3390/act12080323. - DOI
    1. Guo L., Lu Z., Yao L. Human-Machine Interaction Sensing Technology Based on Hand Gesture Recognition: A Review. IEEE Trans. Human-Mach. Syst. 2021;51:300–309. doi: 10.1109/THMS.2021.3086003. - DOI
    1. Oudah M., Al-Naji A., Chahl J. Hand Gesture Recognition Based on Computer Vision: A Review of Techniques. J. Imaging. 2020;6:73. doi: 10.3390/jimaging6080073. - DOI - PMC - PubMed