Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 17;13(1):1401.
doi: 10.1038/s41467-022-29083-0.

Decoding lip language using triboelectric sensors with deep learning

Affiliations

Decoding lip language using triboelectric sensors with deep learning

Yijia Lu et al. Nat Commun. .

Abstract

Lip language is an effective method of voice-off communication in daily life for people with vocal cord lesions and laryngeal and lingual injuries without occupying the hands. Collection and interpretation of lip language is challenging. Here, we propose the concept of a novel lip-language decoding system with self-powered, low-cost, contact and flexible triboelectric sensors and a well-trained dilated recurrent neural network model based on prototype learning. The structural principle and electrical properties of the flexible sensors are measured and analysed. Lip motions for selected vowels, words, phrases, silent speech and voice speech are collected and compared. The prototype learning model reaches a test accuracy of 94.5% in training 20 classes with 100 samples each. The applications, such as identity recognition to unlock a gate, directional control of a toy car and lip-motion to speech conversion, work well and demonstrate great feasibility and potential. Our work presents a promising way to help people lacking a voice live a convenient life with barrier-free communication and boost their happiness, enriches the diversity of lip-language translation systems and will have potential value in many applications.

PubMed Disclaimer

Conflict of interest statement

Y.L, H.T, L.J., and J.C. are inventors on patent application (pending, #2020115579741) submitted by Tsinghua University. All the other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The concept, structure and mechanism of the lip-language decoding system supported by triboelectric sensors.
a Schematic illustration of the lip-language decoding system and its components, including triboelectric sensors, signal processing and deep learning classifiers. b Structure scheme for the flexible triboelectric sensor. c Schematic diagram of four stages of charge transfer in one mouth open-close cycle. The mouth-opening process pushes the sensor, and the mouth-closing process releases the sensor, resulting in the flow of current in opposite directions.
Fig. 2
Fig. 2. Electrical characteristics of the TENG sensors.
a Platform with adjustable pressing force and frequency consists of a linear motor, ergometer and sensor. b, c The open-circuit voltage and short-circuit current obtained by pressing the sensor at 1 Hz frequency with different forces (from 1 N to 5 N). Inset: schematic diagram of the sensor size. d, e The open-circuit voltage and short-circuit current output of the sensor at the pressing frequency (from 1 Hz to 5 Hz) with a force of 5 N. f, g Open-circuit voltage and short-circuit current output obtained from sensors with different areas at a frequency of 1 Hz with a force of 5 N. h, i The open-circuit voltage and short-circuit current output obtained from sensors with different thicknesses at a frequency of 1 Hz with a force of 5 N. j The open-circuit voltage for two sensors placed in series and in parallel. k The maximum output power and maximum voltage curves for the triboelectric sensor used as a power supply with different external load resistances (ranging from 107 to 1011 Ω) with a force of 5 N at a frequency of 2 Hz. l Mechanical durability test for up to 2000 press-release cycles. Inset: the voltage signals generated for the initial 10 s and the final 10 s.
Fig. 3
Fig. 3. Signals generated by mouth muscles and a comparison of sound and lip-motion signals.
a In a typical speaking sequence (“Open Sesame”), the mouth shape synchronizes with the signal, and the regions separated by the mouth state are denoted as closed, opening and closing. b The combined and decomposed lip-motion signals for “Zhi”, “Ma”, “Kai”, “Men”, “Zhi Ma”, “Kai Men”, and “Zhi Ma Kai Men”. c The lip-motion signals for silent and vocal speaking remain the same, and sounds are recorded synchronously when speaking “Open Sesame”. d Sound and lip-motion signals are collected simultaneously at four speeds of reading “Nice To Meet You”. e Sound and lip-motion signals collected simultaneously when four participants (Han, Bo, Pan and Bin) read “Zhi Ma Kai Men”. f The time used for lip-motion signals for each word in “Nice To Meet You” at four different reading speeds. g The time used for sound signals for each word in “Nice To Meet You” at four different reading speeds. h The time statistics for the lip-motion signals ahead of the sound signals for each word in “Zhi Ma Kai Men” spoken by four participants.
Fig. 4
Fig. 4. A signal classification experiment based on deep learning.
a Deep learning aided data process flow, including a training and an inference process. b Schematic diagram of the overall structure of the dilated recurrent neural network model based on prototype learning. c Structure diagram of the feature extractor in the model. d The basic unit of the dilated recurrent neural network GRU. e The training and testing accuracy curves obtained during the learning process for the dilated recurrent neural network based on the softmax-classification layer and prototype learning. f Visualization of the two-dimensional features of the dilated recurrent neural network based on prototype learning. g Comparison of the test accuracy for two models with different sample sizes. h 3D plot of the lip-motion signals generated for 20 spoken words (fruit) in the dataset. i The confusion matrix for lip-motion signals for 20 words (fruits).
Fig. 5
Fig. 5. The applications for lip-language decoding in personal identity verification (PIV), toy-car control and lip motion to speech conversion for assisting with communication for people lacking a voice.
a Schematic diagram of unlocking a gate by lip motion with personal identity verification. b A comparison of lip-motion signals from participants (Han and Bin) in the time domain. c Short-time Fourier transform (STFT) analysis of the lip-motion signals from Han. d Schematic diagram of direction control for toy car motion by lip motion. e Comparison of lip-motion signals from Han in the time domain. f STFT analysis of the lip-motion signals for ‘Go forwards’ from Han. g Schematic diagram of the daily voice communication for people lacking a voice with and without lip-language decoding system (LLDS).

References

    1. Hill J., Lillo-Martin D. & Wood S. Sign Languages Structures and Contexts, 1st edn (Routledge, 2019).
    1. Kim KW, Lee MS, Soon BR, Ryu MH, Kim JN. Recognition of sign language with an inertial sensor-based data glove. Technol. Health Care. 2015;24:S223–S230. doi: 10.3233/THC-151078. - DOI - PubMed
    1. Qian Q., et al. Ultrasensitive paper-based polyaniline/graphene composite strain sensor for sign language expression. Compos. Sci. Technol.10.1016/j.compscitech.2019.05.017 (2019).
    1. Zhou Z, et al. Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat. Electron. 2020;3:571–578. doi: 10.1038/s41928-020-0428-6. - DOI
    1. Kudrinko K, Flavin E, Zhu X, Li Q. Wearable sensor-based sign language recognition: a comprehensive review. IEEE Rev. Biomed. Eng. 2021;14:82–97. doi: 10.1109/RBME.2020.3019769. - DOI - PubMed

Publication types