Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 2;25(17):5411.
doi: 10.3390/s25175411.

Redesigning Multimodal Interaction: Adaptive Signal Processing and Cross-Modal Interaction for Hands-Free Computer Interaction

Affiliations

Redesigning Multimodal Interaction: Adaptive Signal Processing and Cross-Modal Interaction for Hands-Free Computer Interaction

Bui Hong Quan et al. Sensors (Basel). .

Abstract

Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech commands, wake-word detection, and vocal gestures. However, existing systems often suffer from limitations in responsiveness and accuracy, especially under real-world conditions. In this paper, we present 3-Modal Human-Computer Interaction (3M-HCI), a novel interaction system that dynamically integrates facial, vocal, and eye-based inputs through a new signal processing pipeline and a cross-modal coordination mechanism. This approach not only enhances recognition accuracy but also reduces interaction latency. Experimental results demonstrate that 3M-HCI outperforms several recent hands-free interaction solutions in both speed and precision, highlighting its potential as a robust assistive interface.

Keywords: adaptive signal processing; assistive technology; hands-free interaction; human-computer interaction; multimodal interaction; vision/camera-based sensors.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of system architecture and processing modules.
Figure 2
Figure 2
Multimodal input processing pipeline.
Figure 3
Figure 3
Mediapipe 478 landmarks. Each landmark point corresponds to a specific part of the face.
Figure 4
Figure 4
Two inner eye corners (p133 and p362 in Mediapipe).
Figure 5
Figure 5
Acceleration curve.
Figure 6
Figure 6
The 3M-HCI graphical user interfaces built with Customtkinter.
Figure 7
Figure 7
Moving and clicking tasks.
Figure 8
Figure 8
Mediapipe facial landmarks detection in different light conditions. (a,b) Bright environment. (c) Dim environment. (d) Dark environment.
Figure 9
Figure 9
Deviations from optimal path.
Figure 10
Figure 10
Number of clicks.
Figure 11
Figure 11
Movement latency.
Figure 12
Figure 12
Jitter deviation of different systems.

References

    1. Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016); Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788. - DOI
    1. Ramachandra C.K., Joseph A. IEyeGASE: An Intelligent Eye Gaze-Based Assessment System for Deeper Insights into Learner Performance. Sensors. 2021;21:6783. doi: 10.3390/s21206783. - DOI - PMC - PubMed
    1. Walle H., De Runz C., Serres B., Venturini G. A Survey on Recent Advances in AI and Vision-Based Methods for Helping and Guiding Visually Impaired People. Appl. Sci. 2022;12:2308. doi: 10.3390/app12052308. - DOI
    1. Ramos P., Zapata M., Valencia K., Vargas V., Ramos-Galarza C. Low-Cost Human–Machine Interface for Computer Control with Facial Landmark Detection and Voice Commands. Sensors. 2022;22:9279. doi: 10.3390/s22239279. - DOI - PMC - PubMed
    1. Zapata M., Valencia-Aragón K., Ramos-Galarza C. Experimental Evaluation of EMKEY: An Assistive Technology for People with Upper Limb Disabilities. Sensors. 2023;23:4049. doi: 10.3390/s23084049. - DOI - PMC - PubMed

LinkOut - more resources