. 2022 Mar 17;13(1):1401.

doi: 10.1038/s41467-022-29083-0.

Decoding lip language using triboelectric sensors with deep learning

Yijia Lu^#¹, Han Tian^#¹, Jia Cheng^#¹, Fei Zhu², Bin Liu¹, Shanshan Wei¹, Linhong Ji¹, Zhong Lin Wang^{3

4

5}

Affiliations

¹ State Key Laboratory of Tribology, Department of Mechanical Engineering, Tsinghua University, Beijing, 100084, China.
² National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
³ Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, China. zhong.wang@mse.gatech.edu.
⁴ School of Nanoscience and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China. zhong.wang@mse.gatech.edu.
⁵ School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0245, USA. zhong.wang@mse.gatech.edu.

^# Contributed equally.

PMID: 35301313
PMCID: PMC8931018
DOI: 10.1038/s41467-022-29083-0

Decoding lip language using triboelectric sensors with deep learning

Yijia Lu et al. Nat Commun. 2022.

. 2022 Mar 17;13(1):1401.

doi: 10.1038/s41467-022-29083-0.

Authors

Yijia Lu^#¹, Han Tian^#¹, Jia Cheng^#¹, Fei Zhu², Bin Liu¹, Shanshan Wei¹, Linhong Ji¹, Zhong Lin Wang^{3

4

5}

Affiliations

¹ State Key Laboratory of Tribology, Department of Mechanical Engineering, Tsinghua University, Beijing, 100084, China.
² National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
³ Beijing Institute of Nanoenergy and Nanosystems, Chinese Academy of Sciences, Beijing, 101400, China. zhong.wang@mse.gatech.edu.
⁴ School of Nanoscience and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China. zhong.wang@mse.gatech.edu.
⁵ School of Materials Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0245, USA. zhong.wang@mse.gatech.edu.

^# Contributed equally.

PMID: 35301313
PMCID: PMC8931018
DOI: 10.1038/s41467-022-29083-0

Abstract

Lip language is an effective method of voice-off communication in daily life for people with vocal cord lesions and laryngeal and lingual injuries without occupying the hands. Collection and interpretation of lip language is challenging. Here, we propose the concept of a novel lip-language decoding system with self-powered, low-cost, contact and flexible triboelectric sensors and a well-trained dilated recurrent neural network model based on prototype learning. The structural principle and electrical properties of the flexible sensors are measured and analysed. Lip motions for selected vowels, words, phrases, silent speech and voice speech are collected and compared. The prototype learning model reaches a test accuracy of 94.5% in training 20 classes with 100 samples each. The applications, such as identity recognition to unlock a gate, directional control of a toy car and lip-motion to speech conversion, work well and demonstrate great feasibility and potential. Our work presents a promising way to help people lacking a voice live a convenient life with barrier-free communication and boost their happiness, enriches the diversity of lip-language translation systems and will have potential value in many applications.

PubMed Disclaimer

Conflict of interest statement

Y.L, H.T, L.J., and J.C. are inventors on patent application (pending, #2020115579741) submitted by Tsinghua University. All the other authors declare no competing interests.

Figures

**Fig. 1. The concept, structure and mechanism of the lip-language decoding system supported by triboelectric sensors.**
a Schematic illustration of the lip-language decoding system and its components, including triboelectric sensors, signal processing and deep learning classifiers. b Structure scheme for the flexible triboelectric sensor. c Schematic diagram of four stages of charge transfer in one mouth open-close cycle. The mouth-opening process pushes the sensor, and the mouth-closing process releases the sensor, resulting in the flow of current in opposite directions.

**Fig. 2. Electrical characteristics of the TENG sensors.**
a Platform with adjustable pressing force and frequency consists of a linear motor, ergometer and sensor. b, c The open-circuit voltage and short-circuit current obtained by pressing the sensor at 1 Hz frequency with different forces (from 1 N to 5 N). Inset: schematic diagram of the sensor size. d, e The open-circuit voltage and short-circuit current output of the sensor at the pressing frequency (from 1 Hz to 5 Hz) with a force of 5 N. f, g Open-circuit voltage and short-circuit current output obtained from sensors with different areas at a frequency of 1 Hz with a force of 5 N. h, i The open-circuit voltage and short-circuit current output obtained from sensors with different thicknesses at a frequency of 1 Hz with a force of 5 N. j The open-circuit voltage for two sensors placed in series and in parallel. k The maximum output power and maximum voltage curves for the triboelectric sensor used as a power supply with different external load resistances (ranging from 10⁷ to 10¹¹ Ω) with a force of 5 N at a frequency of 2 Hz. l Mechanical durability test for up to 2000 press-release cycles. Inset: the voltage signals generated for the initial 10 s and the final 10 s.

**Fig. 3. Signals generated by mouth muscles and a comparison of sound and lip-motion signals.**
a In a typical speaking sequence (“Open Sesame”), the mouth shape synchronizes with the signal, and the regions separated by the mouth state are denoted as closed, opening and closing. b The combined and decomposed lip-motion signals for “Zhi”, “Ma”, “Kai”, “Men”, “Zhi Ma”, “Kai Men”, and “Zhi Ma Kai Men”. c The lip-motion signals for silent and vocal speaking remain the same, and sounds are recorded synchronously when speaking “Open Sesame”. d Sound and lip-motion signals are collected simultaneously at four speeds of reading “Nice To Meet You”. e Sound and lip-motion signals collected simultaneously when four participants (Han, Bo, Pan and Bin) read “Zhi Ma Kai Men”. f The time used for lip-motion signals for each word in “Nice To Meet You” at four different reading speeds. g The time used for sound signals for each word in “Nice To Meet You” at four different reading speeds. h The time statistics for the lip-motion signals ahead of the sound signals for each word in “Zhi Ma Kai Men” spoken by four participants.

**Fig. 4. A signal classification experiment based on deep learning.**
a Deep learning aided data process flow, including a training and an inference process. b Schematic diagram of the overall structure of the dilated recurrent neural network model based on prototype learning. c Structure diagram of the feature extractor in the model. d The basic unit of the dilated recurrent neural network GRU. e The training and testing accuracy curves obtained during the learning process for the dilated recurrent neural network based on the softmax-classification layer and prototype learning. f Visualization of the two-dimensional features of the dilated recurrent neural network based on prototype learning. g Comparison of the test accuracy for two models with different sample sizes. h 3D plot of the lip-motion signals generated for 20 spoken words (fruit) in the dataset. i The confusion matrix for lip-motion signals for 20 words (fruits).

Fig. 5. The applications for lip-language decoding in personal identity verification (PIV), toy-car control and lip motion to speech conversion for assisting with communication for people lacking a voice.
a Schematic diagram of unlocking a gate by lip motion with personal identity verification. b A comparison of lip-motion signals from participants (Han and Bin) in the time domain. c Short-time Fourier transform (STFT) analysis of the lip-motion signals from Han. d Schematic diagram of direction control for toy car motion by lip motion. e Comparison of lip-motion signals from Han in the time domain. f STFT analysis of the lip-motion signals for ‘Go forwards’ from Han. g Schematic diagram of the daily voice communication for people lacking a voice with and without lip-language decoding system (LLDS).

See this image and copyright information in PMC

References

1. Hill J., Lillo-Martin D. & Wood S. Sign Languages Structures and Contexts, 1st edn (Routledge, 2019).
1. Kim KW, Lee MS, Soon BR, Ryu MH, Kim JN. Recognition of sign language with an inertial sensor-based data glove. Technol. Health Care. 2015;24:S223–S230. doi: 10.3233/THC-151078. - DOI - PubMed
1. Qian Q., et al. Ultrasensitive paper-based polyaniline/graphene composite strain sensor for sign language expression. Compos. Sci. Technol.10.1016/j.compscitech.2019.05.017 (2019).
1. Zhou Z, et al. Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat. Electron. 2020;3:571–578. doi: 10.1038/s41928-020-0428-6. - DOI
1. Kudrinko K, Flavin E, Zhu X, Li Q. Wearable sensor-based sign language recognition: a comprehensive review. IEEE Rev. Biomed. Eng. 2021;14:82–97. doi: 10.1109/RBME.2020.3019769. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Decoding lip language using triboelectric sensors with deep learning

Affiliations

Decoding lip language using triboelectric sensors with deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources