Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 29;22(23):9279.
doi: 10.3390/s22239279.

Low-Cost Human-Machine Interface for Computer Control with Facial Landmark Detection and Voice Commands

Affiliations

Low-Cost Human-Machine Interface for Computer Control with Facial Landmark Detection and Voice Commands

Pablo Ramos et al. Sensors (Basel). .

Abstract

Nowadays, daily life involves the extensive use of computers, since human beings are immersed in a technological society. Therefore, it is mandatory to interact with computers, which represents a true disadvantage for people with upper limb disabilities. In this context, this work aims to develop an interface for emulating mouse and keyboard functions (EMKEY) by applying concepts of artificial vision and voice recognition to replace the use of hands. Pointer control is achieved by head movement, whereas voice recognition is used to perform interface functionalities, including speech-to-text transcription. To evaluate the interface's usability and usefulness, two studies were carried out. The first study was performed with 30 participants without physical disabilities. Throughout this study, there were significant correlations found between the emulator's usability and aspects such as adaptability, execution time, and the participant's age. In the second study, the use of the emulator was analyzed by four participants with motor disabilities. It was found that the interface was best used by the participant with cerebral palsy, followed by the participants with upper limb paralysis, spina bifida, and muscular dystrophy. In general, the results show that the proposed interface is easy to use, practical, fairly accurate, and works on a wide range of computers.

Keywords: H/M interface; face tracking; facial landmarks; handicap; keyboard; mouse; speech recognition; voice commands.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
EMKEY block diagram.
Figure 2
Figure 2
Program execution scheme.
Figure 3
Figure 3
General flowchart of the video processing module. Note: The programming constants are: CAM_W = 640, CAM_H = 480, MAR_LIM = 0.6, MAR_NUM_FRAMES = 15.
Figure 4
Figure 4
Facial landmarks acquired from Dlib predictor. Each landmark point determine a specific part of the face [29].
Figure 5
Figure 5
Face recognition when moving to the left.
Figure 6
Figure 6
General flowchart of speech recognition module. Note: the parameter values are MODEL = Spanish, Sample Rate = 16 k, Format = 16 Bits, Channel = 1, Rate = 16 k, Input = True, and Frames_Per_Buffer = 8192.
Figure 7
Figure 7
Virtual segmentation of the screen to facilitate control pointer position (quadrant number). “a” is a quarter of the screen width, “b” is a third of the screen height, “posA” is half of “a”, and “posB” is half of “b”.
Figure 8
Figure 8
EMKEY module interactions.
Figure 9
Figure 9
Face detection and mouth landmark prediction at different lighting levels. (a) Image captured at 59 lux. (b) Image captured at 32 lux. (c) Image captured at 24 lux. (d) Image captured at 2 lux.
Figure 10
Figure 10
Detection with more than one face. (a) Image with correct operation. (b) Image with erroneous operation.
Figure 11
Figure 11
EMKEY running on Ubuntu.
Figure 12
Figure 12
Relationship between prior computer skills and adaptability to the emulator.
Figure 13
Figure 13
Relationship between age and the time it takes to perform an internet search.
Figure 14
Figure 14
Relationship between age and number of times the pointer was restarted.
Figure 15
Figure 15
Relationship between age and ease of using the emulator.
Figure 16
Figure 16
Performance of men and women using the emulator. Questions are listed in Table 3.
Figure 17
Figure 17
EMKEY usability according to the type of disability. P1: cerebral palsy, P2: upper limb paralysis, P3: spina bífida, P4: Duchenne muscular dystrophy.

References

    1. Saponara S., Elhanashi A., Gagliardi A. Implementing a real-time, AI-based, people detection and social distancing measuring system for COVID-19. J. Real Time Image Process. 2021;18:1937–1947. doi: 10.1007/s11554-021-01070-6. - DOI - PMC - PubMed
    1. Kamath R.C., Amudha J. IEyeGASE: An intelligent eye gaze-based assessment system for deeper insights into learner performance. Sensors. 2021;21–20:6783. - PMC - PubMed
    1. Espada R., Moreno R., Morán M. Educación inclusiva y TIC: Sistemas de barrido ocular para alumnado con parálisis cerebral en Educación Primaria. Ens. Rev. Fac. Educ. Albacete. 2020;35:171–190.
    1. Lupu R.G., Ungureano F., Siriteanu V. Eye tracking mouse for human computer interaction; Proceedings of the E-Health and Bioengineering Conference (EHB); Iasi, Romania. 21–23 November 2013.
    1. Zhang X., Liu X., Ming S., Lin S. Eye tracking based control system for natural human-computer interaction. Comput. Intell. Neurosci. 2017;2017:5739301. doi: 10.1155/2017/5739301. - DOI - PMC - PubMed