. 2016 Jan 18;16(1):115.

doi: 10.3390/s16010115.

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Francisco Javier Ordóñez¹, Daniel Roggen²

Affiliations

¹ Wearable Technologies, Sensor Technology Research Centre, University of Sussex, Brighton BN1 9RH, UK. F.Ordonez-Morales@sussex.ac.uk.
² Wearable Technologies, Sensor Technology Research Centre, University of Sussex, Brighton BN1 9RH, UK. daniel.roggen@ieee.org.

PMID: 26797612
PMCID: PMC4732148
DOI: 10.3390/s16010115

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Francisco Javier Ordóñez et al. Sensors (Basel). 2016.

. 2016 Jan 18;16(1):115.

doi: 10.3390/s16010115.

Authors

Francisco Javier Ordóñez¹, Daniel Roggen²

Affiliations

¹ Wearable Technologies, Sensor Technology Research Centre, University of Sussex, Brighton BN1 9RH, UK. F.Ordonez-Morales@sussex.ac.uk.
² Wearable Technologies, Sensor Technology Research Centre, University of Sussex, Brighton BN1 9RH, UK. daniel.roggen@ieee.org.

PMID: 26797612
PMCID: PMC4732148
DOI: 10.3390/s16010115

Abstract

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters' influence on performance to provide insights about their optimisation.

Keywords: LSTM; deep learning; human activity recognition; machine learning; neural network; sensor fusion; wearable sensors.

PubMed Disclaimer

Figures

**Figure 1**
Different types of units in neural networks. (a) MLP with three dense layers; (b) recurrent neural network (RNN) with two dense layers. The activation and hidden value of the unit in layer ( $l + 1$ ) are computed in the same time step t; (c) The recurrent LSTM cell is an extension of RNNs, where the internal memory can be updated, erased or read out.

**Figure 2**
Representation of a temporal convolution over a single sensor channel in a three-layer convolutional neural network (CNN). Layer $(l - 1)$ defines the sensor data at the input. The next layer (l) is composed of two feature maps ( $a_{1}^{l} (τ)$ and $a_{2}^{l} (τ)$ ) extracted by two different kernels ( $K_{11}^{(l - 1)}$ and $K_{21}^{(l - 1)}$ ). The deepest layer (layer $(l + 1)$ ) is composed by a single feature map, resulting from temporal convolution in layer l of a two-dimensional kernel $K_{1}^{l}$ . The time axis (which is convolved over) is horizontal.

**Figure 3**
Architecture of the DeepConvLSTM (Conv, convolutional) framework for activity recognition. From the left, the signals coming from the wearable sensors are processed by four convolutional layers, which allow learning features from the data. Two dense layers then perform a non-linear transformation, which yields the classification outcome with a softmax logistic regression output layer on the right. Input at Layer 1 corresponds to sensor data of size $D \times S^{1}$ , where D denotes the number of sensor channels and $S^{l}$ the length of features maps in layer l. Layers 2–5 are convolutional layers. $K^{l}$ denotes the kernels in layer l (depicted as red squares). $F^{l}$ denotes the number of feature maps in layer l. In convolutional layers, $a_{i}^{l}$ denotes the activation that defines the feature map i in layer l. Layers 6 and 7 are dense layers. In dense layers, $a_{t, i}^{l}$ denotes the activation of the unit i in hidden layer l at time t. The time axis is vertical.

**Figure 4**
Placement of on-body sensors used in the OPPORTUNITYdataset (left: inertial measurements units; right: 3-axis accelerometers) [7].

**Figure 5**
Sequence labelling after segmenting the data with a sliding window. The sensor signals are segmented by a jumping window. The activity class within each sequence is considered to be the ground truth label annotated at the sample T of that window.

**Figure 6**
Output class probabilities for a ~25 s-long fragment of sensor signals in the test set of the OPPORTUNITY dataset, which comprises 10 annotated gestures. Each point in the plot represents the class probabilities obtained from processing the data within a sequence of 500 ms obtained from a sliding window ending at that point. The dashed line represents the *Null* class. DeepConvLSTM offers a better performance identifying the start and ending of gestures.

**Figure 7**
$F_{1}$ score performance of DeepConvLSTM on the OPPORTUNITY dataset. Classification performance is displayed individually per gesture, for different lengths of the input sensor data segments. Experiments carried out with sequences of length of 400 ms, 500 ms, 1400 ms and 2750 ms. The horizontal axis represents the ratio between the gesture length and the sequence length (ratios under one represent performance for gestures whose durations are shorter than the sequence duration).

**Figure 8**
Performance of Skoda and OPPORTUNITY (recognizing gestures and with the *Null* class) datasets with different numbers of convolutional layers.

See this image and copyright information in PMC

Cited by

A deep learning model based on whole slide images to predict disease-free survival in cutaneous melanoma patients.
Comes MC, Fucci L, Mele F, Bove S, Cristofaro C, De Risi I, Fanizzi A, Milella M, Strippoli S, Zito A, Guida M, Massafra R. Comes MC, et al. Sci Rep. 2022 Nov 27;12(1):20366. doi: 10.1038/s41598-022-24315-1. Sci Rep. 2022. PMID: 36437296 Free PMC article.
LARa: Creating a Dataset for Human Activity Recognition in Logistics Using Semantic Attributes.
Niemann F, Reining C, Moya Rueda F, Nair NR, Steffens JA, Fink GA, Ten Hompel M. Niemann F, et al. Sensors (Basel). 2020 Jul 22;20(15):4083. doi: 10.3390/s20154083. Sensors (Basel). 2020. PMID: 32707928 Free PMC article.
Multi-Modal Deep Learning for Assessing Surgeon Technical Skill.
Kasa K, Burns D, Goldenberg MG, Selim O, Whyne C, Hardisty M. Kasa K, et al. Sensors (Basel). 2022 Sep 27;22(19):7328. doi: 10.3390/s22197328. Sensors (Basel). 2022. PMID: 36236424 Free PMC article.
Smartphone-Based Activity Recognition in a Pedestrian Navigation Context.
Jackermeier R, Ludwig B. Jackermeier R, et al. Sensors (Basel). 2021 May 7;21(9):3243. doi: 10.3390/s21093243. Sensors (Basel). 2021. PMID: 34067137 Free PMC article.
Cross-Domain Human Activity Recognition Using Low-Resolution Infrared Sensors.
Diaz G, Tan B, Sobron I, Eizmendi I, Landa I, Velez M. Diaz G, et al. Sensors (Basel). 2024 Oct 2;24(19):6388. doi: 10.3390/s24196388. Sensors (Basel). 2024. PMID: 39409429 Free PMC article.

See all "Cited by" articles

References

1. Rashidi P., Cook D.J. The resident in the loop: Adapting the smart home to the user. IEEE Trans. Syst. Man. Cybern. J. Part A. 2009;39:949–959. doi: 10.1109/TSMCA.2009.2025137. - DOI
1. Patel S., Park H., Bonato P., Chan L., Rodgers M. A review of wearable sensors and systems with application in rehabilitation. J. NeuroEng. Rehabil. 2012;9 doi: 10.1186/1743-0003-9-21. - DOI - PMC - PubMed
1. Avci A., Bosch S., Marin-Perianu M., Marin-Perianu R., Havinga P. Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey; Proceedings of the 23rd International Conference on Architecture of Computing Systems (ARCS); Hannover, Germany. 22–23 Febuary 2010; pp. 1–10.
1. Mazilu S., Blanke U., Hardegger M., Tröster G., Gazit E., Hausdorff J.M. GaitAssist: A Daily-Life Support and Training System for Parkinson’s Disease Patients with Freezing of Gait; Proceedings of the ACM Conference on Human Factors in Computing Systems (SIGCHI); Toronto, ON, Canada. 26 April–1 May 2014.
1. Kranz M., Möller A., Hammerla N., Diewald S., Plötz T., Olivier P., Roalter L. The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices. Perv. Mob. Comput. 2013;9:203–215. doi: 10.1016/j.pmcj.2012.06.002. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Affiliations

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous