Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 10;12(1):5378.
doi: 10.1038/s41467-021-25637-w.

AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove

Affiliations

AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove

Feng Wen et al. Nat Commun. .

Abstract

Sign language recognition, especially the sentence recognition, is of great significance for lowering the communication barrier between the hearing/speech impaired and the non-signers. The general glove solutions, which are employed to detect motions of our dexterous hands, only achieve recognizing discrete single gestures (i.e., numbers, letters, or words) instead of sentences, far from satisfying the meet of the signers' daily communication. Here, we propose an artificial intelligence enabled sign language recognition and communication system comprising sensing gloves, deep learning block, and virtual reality interface. Non-segmentation and segmentation assisted deep learning model achieves the recognition of 50 words and 20 sentences. Significantly, the segmentation approach splits entire sentence signals into word units. Then the deep learning model recognizes all word elements and reversely reconstructs and recognizes sentences. Furthermore, new/never-seen sentences created by new-order word elements recombination can be recognized with an average correct rate of 86.67%. Finally, the sign language recognition results are projected into virtual space and translated into text and audio, allowing the remote and bidirectional communication between signers and non-signers.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: F.W., Z.Z., T.H., and C.L. are inventors on the patent application (pending, Ref: 2021-187, serial no. 10202109596U) submitted by National University of Singapore that covers sign language recognition and communication system.

Figures

Fig. 1
Fig. 1. The glove configuration and sensor characterization.
a Schematics of the sign language recognition and communication system. b Proportion of different motions that are commonly used in sign language helps determine sensor position on gloves. c Sensor position on gloves based on hand motion analysis in b. Detailed area information of the sensor on each position can be found in Supplementary Fig. 1. d Materials of the triboelectric sensor. eh Voltage output dependence on key parameters, including sensor area, force, bending degree, and bending speed. The hand, head, and phone images are created by the authors via Blender.
Fig. 2
Fig. 2. The data analysis for signals of 50 words and 20 sentences.
a Part of representatives among 50 words or gestures (showing 19 gestures here), in which the opaque and translucent gesture images show the starting and final state of the gesture, respectively. The rest of 31 gesture photo and their corresponding triboelectric signals can be found in Supplementary Fig. 2. b Triboelectric voltage output of 19 words (top), and the similarity and correlation analysis based on the word signals (bottom). The high correlation coefficient of ‘Get’ and ‘Must’ shows a high similarity between these two gesture signals, indicating a high possibility for wrong classification. c Correlation coefficient matrix of signals of 50 words. d Correlation coefficient distribution curve of 50 words. e Correlation coefficient matrix of 20 sign language sentences. f Correlation coefficient distribution curve of 20 sentences. g Voltage output of 20 sentences. Photo credit: Feng Wen, National University of Singapore. Source data of Figs. 2c, e are provided in Harvard Dataverse.
Fig. 3
Fig. 3. Word and sentence recognition based on the nonsegmentation method.
ac Optimization of CNN structure parameters based on accuracy performance, including Kernel size, the number of filters, and the number of convolutional layers. Boxplots indicate median (middle line), 25th, 75th percentile (box) and 5th and 95th percentile (whiskers) as well as outliers (single points). d Final structure of CNN after optimization. ef Cluster results of word signal from CNN input layer and output layer. g Confusion map of recognizing 50 words. hi Cluster results of sentence signal from CNN input layer and output layer. j Confusion map of recognizing 17 sentences. Source data of Figs. 3g, j are provided in Harvard Dataverse.
Fig. 4
Fig. 4. Word and sentence recognition based on segmentation method, which brings the feasibility of new/never-seen sentence recognition.
a Label table (W01–W20) of 19 words (they are among the total 50 words) that present in 20 sentences (using Y1–Y17 and New1–New3 denote). b Schematic diagram of sentence signal segmentation, ‘The dog scared me’ is taken as an example. c Summary table of sentences with category remarks, comprised words, label series, and unique labeled number order. The same word using the same color to mark. d Schematic diagram of single classifier. e Confusion map of split word element recognition (accuracy 81.9%) based on single classifier. f With successful recognition of each element in sentences, the sentences can be inversely reconstructed and recognized at an average correct rate of 79.41% within single classifier. The dark green means right recognition and light blue means wrong prediction. g Schematic diagram of the hierarchy classifier. h Confusion map of segmented word element recognition (accuracy 82.8%) based on hierarchy classifer. i With successful recognition of each element in sentences, the sentence can be inversely reconstructed and recognition at an average correct rate of 85.58% within hierarchy classifier. j Recognition process of three new sentences that the CNN model did not learn before, taking ‘I lost my dog’ as an example. The detailed recognition results can be found in Supplementary Table 2. Source data of Figs. 4e, h are provided in Harvard Dataverse.
Fig. 5
Fig. 5. The demonstration of communication between the speech impaired and the nonsigner.
a Flow chart of the sign language recognition and communication system, which allows the signer to use sign language and nonsigner to types directly to engage in the interaction. The delivered sign language by the signer is recognized and translated into text and speech by AI block. Based on TCP/IP, the client (controlled by signer Lily) in VR interface receives the recognition results and transmits to the sever (operated by the nonsigner Mary). The nonsigner types on the chat box to respond to the signer. b (i–v) Communication/conversation process in VR interface between the speech-disordered user Lily and nonsigner Mary based on the sign language recognition and communication system. The red rectangle indicates the corresponding reaction of these two users. These photos are of one of the authors. c Conversation summary of b. The hand image is created by the authors via Blender. Photo credit: Feng Wen, National University of Singapore.

References

    1. Mamun MAA, Yuce MR. Recent progress in nanomaterial enabled chemical sensors for wearable environmental monitoring applications. Adv. Funct. Mater. 2020;30:2005703. doi: 10.1002/adfm.202005703. - DOI
    1. Gao W, et al. Fully integrated wearable sensor arrays for multiplexed in situ perspiration analysis. Nature. 2016;529:509–514. doi: 10.1038/nature16521. - DOI - PMC - PubMed
    1. Huang Z, et al. Three-dimensional integrated stretchable electronics. Nat. Electron. 2018;1:473–480. doi: 10.1038/s41928-018-0116-y. - DOI
    1. Wang C, et al. Monitoring of the central blood pressure waveform via a conformal ultrasonic device. Nat. Biomed. Eng. 2018;2:687–695. doi: 10.1038/s41551-018-0287-x. - DOI - PMC - PubMed
    1. Chung HU, et al. Binodal, wireless epidermal electronic systems with in-sensor analytics for neonatal intensive care. Science. 2019;363:6430. doi: 10.1126/science.aau0780. - DOI - PMC - PubMed

Publication types