Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 18;16(1):115.
doi: 10.3390/s16010115.

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Affiliations

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Francisco Javier Ordóñez et al. Sensors (Basel). .

Abstract

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters' influence on performance to provide insights about their optimisation.

Keywords: LSTM; deep learning; human activity recognition; machine learning; neural network; sensor fusion; wearable sensors.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Different types of units in neural networks. (a) MLP with three dense layers; (b) recurrent neural network (RNN) with two dense layers. The activation and hidden value of the unit in layer (l+1) are computed in the same time step t; (c) The recurrent LSTM cell is an extension of RNNs, where the internal memory can be updated, erased or read out.
Figure 2
Figure 2
Representation of a temporal convolution over a single sensor channel in a three-layer convolutional neural network (CNN). Layer (l-1) defines the sensor data at the input. The next layer (l) is composed of two feature maps (a1l(τ) and a2l(τ)) extracted by two different kernels (K11(l-1) and K21(l-1)). The deepest layer (layer (l+1)) is composed by a single feature map, resulting from temporal convolution in layer l of a two-dimensional kernel K1l. The time axis (which is convolved over) is horizontal.
Figure 3
Figure 3
Architecture of the DeepConvLSTM (Conv, convolutional) framework for activity recognition. From the left, the signals coming from the wearable sensors are processed by four convolutional layers, which allow learning features from the data. Two dense layers then perform a non-linear transformation, which yields the classification outcome with a softmax logistic regression output layer on the right. Input at Layer 1 corresponds to sensor data of size D×S1, where D denotes the number of sensor channels and Sl the length of features maps in layer l. Layers 2–5 are convolutional layers. Kl denotes the kernels in layer l (depicted as red squares). Fl denotes the number of feature maps in layer l. In convolutional layers, ail denotes the activation that defines the feature map i in layer l. Layers 6 and 7 are dense layers. In dense layers, at,il denotes the activation of the unit i in hidden layer l at time t. The time axis is vertical.
Figure 4
Figure 4
Placement of on-body sensors used in the OPPORTUNITYdataset (left: inertial measurements units; right: 3-axis accelerometers) [7].
Figure 5
Figure 5
Sequence labelling after segmenting the data with a sliding window. The sensor signals are segmented by a jumping window. The activity class within each sequence is considered to be the ground truth label annotated at the sample T of that window.
Figure 6
Figure 6
Output class probabilities for a ~25 s-long fragment of sensor signals in the test set of the OPPORTUNITY dataset, which comprises 10 annotated gestures. Each point in the plot represents the class probabilities obtained from processing the data within a sequence of 500 ms obtained from a sliding window ending at that point. The dashed line represents the Null class. DeepConvLSTM offers a better performance identifying the start and ending of gestures.
Figure 7
Figure 7
F1 score performance of DeepConvLSTM on the OPPORTUNITY dataset. Classification performance is displayed individually per gesture, for different lengths of the input sensor data segments. Experiments carried out with sequences of length of 400 ms, 500 ms, 1400 ms and 2750 ms. The horizontal axis represents the ratio between the gesture length and the sequence length (ratios under one represent performance for gestures whose durations are shorter than the sequence duration).
Figure 8
Figure 8
Performance of Skoda and OPPORTUNITY (recognizing gestures and with the Null class) datasets with different numbers of convolutional layers.

Similar articles

Cited by

References

    1. Rashidi P., Cook D.J. The resident in the loop: Adapting the smart home to the user. IEEE Trans. Syst. Man. Cybern. J. Part A. 2009;39:949–959. doi: 10.1109/TSMCA.2009.2025137. - DOI
    1. Patel S., Park H., Bonato P., Chan L., Rodgers M. A review of wearable sensors and systems with application in rehabilitation. J. NeuroEng. Rehabil. 2012;9 doi: 10.1186/1743-0003-9-21. - DOI - PMC - PubMed
    1. Avci A., Bosch S., Marin-Perianu M., Marin-Perianu R., Havinga P. Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey; Proceedings of the 23rd International Conference on Architecture of Computing Systems (ARCS); Hannover, Germany. 22–23 Febuary 2010; pp. 1–10.
    1. Mazilu S., Blanke U., Hardegger M., Tröster G., Gazit E., Hausdorff J.M. GaitAssist: A Daily-Life Support and Training System for Parkinson’s Disease Patients with Freezing of Gait; Proceedings of the ACM Conference on Human Factors in Computing Systems (SIGCHI); Toronto, ON, Canada. 26 April–1 May 2014.
    1. Kranz M., Möller A., Hammerla N., Diewald S., Plötz T., Olivier P., Roalter L. The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices. Perv. Mob. Comput. 2013;9:203–215. doi: 10.1016/j.pmcj.2012.06.002. - DOI

Publication types