Saturday , September 18 2021

How to develop the RNN models for the classification of time series of recognition of human activities

The recognition of human activity is the problem of classifying sequences of accelerometer data recorded by specialized cabling or smart phones in well-defined known movements.

The classic approaches to the problem concern the characteristics of hand processing from time series data based on fixed-size windows and machine learning models, as sets of decision trees. The difficulty is that this feature engineering requires strong industry experience.

Recently, profound learning methods such as recurrent neural networks such as LSTM and variations that make use of one-dimensional convolutional neural networks or CNN have been shown to provide results at the forefront of recognition of challenging activities with minimal or no feature engineering data, instead to use the learning of features on raw data.

In this tutorial, you will discover three architectures of recurrent neural networks for modeling a problem of classifying time series of activity recognition.

After completing this tutorial, you will know:

  • How to develop a recurrent short-term memory neural network for the recognition of human activity.
  • How to develop an LSTM model, or CNN-LSTM, one-dimensional convolutional Neural Network.
  • How to develop a one-dimensional Convolutional LSTM, or ConvLSTM, model for the same problem.

Let's begin.

How to develop the RNN models for the classification of time series of recognition of human activities

How to develop the RNN models for the classification of time series of recognition of human activities
Photo by Bonnie Moreland, some rights reserved.

Overview of the exercise

This tutorial is divided into four parts; they are:

  1. Acknowledgment of activities through smartphone data sets
  2. Develop an LSTM network model
  3. Develop a CNN-LSTM network model
  4. Develop a ConvLSTM network model

Acknowledgment of activities through smartphone data sets

Recognizing human activities, or HAR in short, is the problem of predicting what a person is doing based on a trace of their movement using sensors.

A standard set of human activity recognition data is the "Smart Phone Recognition" set available in 2012.

It was prepared and made available by Davide Anguita, et al. from the University of Genoa, in Italy, it is described in full in the 2013 document "A set of data in the public domain for the recognition of human activities via smartphone". The data set was modeled with machine learning algorithms in the 2012 document entitled "Recognition of human activities on smartphones using a multicast and carrier-compatible vectorial machine."

The dataset has been made available and can be downloaded for free from the UCI Machine Learning Repository:

Data were collected from 30 subjects aged between 19 and 48 by performing one of the six standard activities while wearing a smartphone mounted on the belt that recorded movement data. The video of each subject performing the activity has been recorded and the movement data has been manually labeled by these videos.

Below is an example of a video of a person performing the activities while their movement data is recorded.

The six activities carried out were as follows:

  1. Walking
  2. Walking upstairs
  3. Walking downstairs
  4. Sitting
  5. Standing
  6. laying

The recorded motion data were the accelerometer x, yez (linear acceleration) and gyroscopic (angular velocity) data from the smartphone, in particular a Samsung Galaxy S II. The observations were recorded at 50 Hz (ie 50 data points per second). Each subject performed the task sequence twice; once with the device on the left side and once with the device on the right side.

Raw data are not available. Instead, a pre-processed version of the data set was made available. The pre-processing steps included:

  • Pre-processing accelerometer and gyroscope with anti-noise filters.
  • Subdivision of data in fixed windows of 2.56 seconds (128 data points) with 50% overlap. Breakdown of accelerometer data into gravitational (total) components and body movement.

Function analysis was applied to the window data and a copy of the data with these engineered features was made available.

A number of time and frequency functions commonly used in the field of human activity recognition have been extracted from each window. The result was a vector of elements of 561 elements.

The data set was divided into train sets (70%) and tests (30%) based on data for the subjects, e.g. 21 subjects for the train and nine for the test.

The results of the experiment with a vector support machine intended for use on a smartphone (eg fixed-point arithmetic) yielded a predictive accuracy of 89% on the test data set, yielding results similar to those of an unmodified SVM implementation.

The data set is available for free and can be downloaded from the UCI Machine Learning repository.

The data is provided as a single zip file with a size of approximately 58 megabytes. The direct link for this download is below:

Download the data set and unzip all the files in a new directory in the current working directory called "HARDataset".

Need help with Deep Learning for Time Series?

Get my free 7-day email course now (with sample code).

Click to register and get a free PDF Ebook version of the course.

Download your FREE mini-course

Develop an LSTM network model

In this section, we will develop a short-term memory network model (LSTM) for the human activity recognition data set.

LSTM network models are a type of recurrent neural network capable of learning and remembering long sequences of input data. They are designed for use with data that includes long data sequences, up to 200-400 time intervals. They can be a good measure for this problem.

The model can support multiple parallel sequences of input data, like each accelerometer and gyroscope data axis. The model learns to extract features from observation sequences and how to map internal features to different types of activities.

The advantage of using LSTM for sequence classification is that they can learn directly from raw time series data and, in turn, do not require domain expertise for manual engineering of input functionality. The model can learn an internal representation of time series data and ideally achieve performance comparable to models suitable for a data set version with engineered characteristics.

This section is divided into four parts; they are:

  1. Loading data
  2. Adapt and evaluate the model
  3. Summarize the results
  4. Complete example

Loading data

The first step is to load the raw data set into memory.

There are three types of main signals in the raw data: total acceleration, body acceleration and body gyroscope. Each has 3 axes of data. This means that there are a total of nine variables for each time step.

Furthermore, each data series has been divided into overlapping windows of 2.65 seconds of data or 128 time intervals. These data windows correspond to the engineered characteristics windows (rows) in the previous section.

This means that one row of data has (128 * 9) or 1,152 elements. This is a little less than twice the size of the 561 vector elements in the previous section and it is likely that there are some redundant data.

The signals are stored in /Inertial signals/ directory under the train and test the subdirectories. Each axis of each signal is stored in a separate file, which means that each of the train and test data sets has nine input files to be loaded and an output file to be loaded. We can batch upload these files into groups, given directory structures and consistent file naming conventions.

The input data is in CSV format where the columns are separated by white space. Each of these files can be loaded as a NumPy array. The LOAD_FILE () the function loads a data set given the file fill path and returns the data loaded as a NumPy array.

We can then upload all data for a particular group (train or test) into a single three-dimensional NumPy array, where the dimensions of the array are[[[[samples, time steps, characteristics].

To make this clearer, there are 128 time steps and nine functions, in which the number of samples is the number of rows in a given raw signal data file.

The load_group () the following function implements this behavior. The NumPy function of dstack () allows us to stack each of the loaded 3D arrays into a single 3D array in which the variables are separated on the third dimension (feature).

We can use this function to load all input signal data for a given group, such as train or test.

The load_dataset_group () the function under load all input signal data and output data for a single group using consistent naming conventions between directories.

Finally, we can load each of the trains and test the data sets.

Output data is defined as an integer for the class number. These class integers must be hot-coded so that the data is suitable for mounting a multi-class classification model of the neural network. We can do this by calling the Keras to_categorical () function.

The load_dataset () the following function implements this behavior and returns the train and tests the X and y elements ready for assembly and evaluation of the defined models.

Adapt and evaluate the model

Now that the data is loaded into memory ready for modeling, we can define, adapt and evaluate an LSTM model.

We can define a call function evaluate_model () taking the train and testing the data set, adapting a model to the training data set, evaluating it on the test data set, and returning an estimate of the model's performance.

First, we need to define the LSTM model using Keras's deep learning library. The model requires a three-dimensional input with[[[[samples, time steps, characteristics].

This is exactly how we loaded data, where a sample is a time series data window, each window has 128 time steps and a time step has nine variables or functions.

The output for the model will be a six-element vector containing the probability of a given window belonging to each of the six types of activity.

The input and output dimensions are necessary for mounting the model and can be extracted from the supplied training data set.

The model is defined as a sequential Keras model, for simplicity.

We will define the model as a single hidden LSTM layer. This is followed by a dropout level designed to reduce the over-treatment of the model to the training data. Finally, a fully connected dense layer is used to interpret the features extracted from the hidden LSTM layer, before using a final output layer to make predictions.

The efficient Adam version of the stochastic gradient descent will be used to optimize the network and the categorical cross entropy loss function will be used as we are learning a multi-class classification problem.

The definition of the model is listed below.

The model is suitable for a fixed number of epochs, in this case 15, and a batch size of 64 samples will be used, in which 64 data windows will be exposed to the model before the model weights are updated.

Once the model is in shape, it is evaluated on the test data set and the accuracy of the fit pattern on the test data set is returned.

Note: it is normal not to mix sequence data when installing an LSTM. Here we mix the input data windows during training (default). In this problem, we are interested in exploiting LSTM's ability to learn and extract functionality through time steps in a window, not through Windows.

The complete evaluate_model () the function is listed below.

There is nothing special in the network structure or chosen hyperparameters, they are just a starting point for this problem.

Summarize the results

We can not judge the model's ability from a single evaluation.

The reason for this is that neural networks are stochastic, which means that a different specific model will result when you train the same model configuration on the same data.

This is a feature of the network as it provides the model with its ability to adapt, but requires a slightly more complicated evaluation of the model.

We will repeat the evaluation of the model several times, then summarize the performance of the model through each of these executions. For example, we can call evaluate_model () a total of 10 times. This will result in a population of model evaluation scores to be summarized.

We can summarize the sample of the scores by calculating and reporting the mean and standard deviation of the performance. The average provides the average accuracy of the model on the data set, while the standard deviation provides the average of the variance of accuracy from the average.

The function summarize_results () below summarizes the results of a race.

We can group the repeated evaluation, the collection of the results and the summary of the results in a main function of the experiment, called run_experiment (), listed below.

By default, the model is evaluated 10 times before the model performance is reported.

Complete example

Now that we have all the pieces, we can tie them together in a working example.

The complete list of the code is provided below.

Running the example first prints the shape of the loaded dataset, then the shape of the train and test sets and the input and output elements. This confirms the number of samples, time steps, and variables, as well as the number of classes.

Next, models are created and evaluated and a debug message is printed for each.

Finally, the sample of scores is printed, followed by the mean and standard deviation. We can see that the model performed well, achieving a classification accuracy of about 89.7% trained on the raw dataset, with a standard deviation of about 1.3.

This is a good result, considering that the original paper published a result of 89%, trained on the dataset with heavy domain-specific feature engineering, not the raw dataset.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.

Now that we have seen how to develop an LSTM model for time series classification, let’s look at how we can develop a more sophisticated CNN LSTM model.

Develop a CNN-LSTM Network Model

The CNN LSTM architecture involves using Convolutional Neural Network (CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction.

CNN LSTMs were developed for visual time series prediction problems and the application of generating textual descriptions from sequences of images (e.g. videos). Specifically, the problems of:

  • Activity Recognition: Generating a textual description of an activity demonstrated in a sequence of images.
  • Image Description: Generating a textual description of a single image.
  • Video Description: Generating a textual description of a sequence of images.

You can learn more about the CNN LSTM architecture in the post:

To learn more about the consequences of combining these models, see the paper:

The CNN LSTM model will read subsequences of the main sequence in as blocks, extract features from each block, then allow the LSTM to interpret the features extracted from each block.

One approach to implementing this model is to split each window of 128 time steps into subsequences for the CNN model to process. For example, the 128 time steps in each window can be split into four subsequences of 32 time steps.

We can then define a CNN model that expects to read in sequences with a length of 32 time steps and nine features.

The entire CNN model can be wrapped in a TimeDistributed layer to allow the same CNN model to read in each of the four subsequences in the window. The extracted features are then flattened and provided to the LSTM model to read, extracting its own features before a final mapping to an activity is made.

It is common to use two consecutive CNN layers followed by dropout and a max pooling layer, and that is the simple structure used in the CNN LSTM model here.

The updated evaluate_model() is listed below.

We can evaluate this model as we did the straight LSTM model in the previous section.

The complete code listing is provided below.

Running the example summarizes the model performance for each of the 10 runs before a final summary of the models performance on the test set is reported.

We can see that the model achieved a performance of about 90.6% with a standard deviation of about 1%.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.

Develop a ConvLSTM Network Model

A further extension of the CNN LSTM idea is to perform the convolutions of the CNN (e.g. how the CNN reads the input sequence data) as part of the LSTM.

This combination is called a Convolutional LSTM, or ConvLSTM for short, and like the CNN LSTM is also used for spatio-temporal data.

Unlike an LSTM that reads the data in directly in order to calculate internal state and state transitions, and unlike the CNN LSTM that is interpreting the output from CNN models, the ConvLSTM is using convolutions directly as part of reading input into the LSTM units themselves.

For more information for how the equations for the ConvLSTM are calculated within the LSTM unit, see the paper:

The Keras library provides the ConvLSTM2D class that supports the ConvLSTM model for 2D data. It can be configured for 1D multivariate time series classification.

The ConvLSTM2D class, by default, expects input data to have the shape:

Where each time step of data is defined as an image of (rows * columns) data points.

In the previous section, we divided a given window of data (128 time steps) into four subsequences of 32 time steps. We can use this same subsequence approach in defining the ConvLSTM2D input where the number of time steps is the number of subsequences in the window, the number of rows is 1 as we are working with one-dimensional data, and the number of columns represents the number of time steps in the subsequence, in this case 32.

For this chosen framing of the problem, the input for the ConvLSTM2D would therefore be:

  • Samples: n, for the number of windows in the dataset.
  • Time: 4, for the four subsequences that we split a window of 128 time steps into.
  • Rows: 1, for the one-dimensional shape of each subsequence.
  • Columns: 32, for the 32 time steps in an input subsequence.
  • channels: 9, for the nine input variables.

We can now prepare the data for the ConvLSTM2D model.

The ConvLSTM2D class requires configuration both in terms of the CNN and the LSTM. This includes specifying the number of filters (e.g. 64), the two-dimensional kernel size, in this case (1 row and 3 columns of the subsequence time steps), and the activation function, in this case rectified linear.

As with a CNN or LSTM model, the output must be flattened into one long vector before it can be interpreted by a dense layer.

We can then evaluate the model as we did the LSTM and CNN LSTM models before it.

The complete example is listed below.

As with the prior experiments, running the model prints the performance of the model each time it is fit and evaluated. A summary of the final model performance is presented at the end of the run.

We can see that the model does consistently perform well on the problem achieving an accuracy of about 90%, perhaps with fewer resources than the larger CNN LSTM model.

Note: given the stochastic nature of the algorithm, your specific results may vary. If so, try running the code a few times.


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Data Preparation. Consider exploring whether simple data scaling schemes can further lift model performance, such as normalization, standardization, and power transforms.
  • LSTM Variations. There are variations of the LSTM architecture that may achieve better performance on this problem, such as stacked LSTMs and Bidirectional LSTMs.
  • Hyperparameter Tuning. Consider exploring tuning of model hyperparameters such as the number of units, training epochs, batch size, and more.

If you explore any of these extensions, I’d love to know.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




In this tutorial, you discovered three recurrent neural network architectures for modeling an activity recognition time series classification problem.

Specifically, you learned:

  • How to develop a Long Short-Term Memory Recurrent Neural Network for human activity recognition.
  • How to develop a one-dimensional Convolutional Neural Network LSTM, or CNN LSTM, model.
  • How to develop a one-dimensional Convolutional LSTM, or ConvLSTM, model for the same problem.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning models for Time Series Today!

Deep Learning for Time Series Forecasting

Develop Your Own Forecasting models in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Time Series Forecasting

It provides self-study tutorials on topics like: CNNs, LSTMs,
Multivariate Forecasting, Multi-Step Forecasting and much more…

Finally Bring Deep Learning to your Time Series Forecasting Projects

Skip the Academics. Just Results.

Click to learn more.

Source link

Leave a Reply

Your email address will not be published.