How I Used Machine Learning and Smartwatches to Track Weightlifting Activity

How machine learning and wearables can change fitness

image credit ( link )

TL;DR:Scroll to the bottom section (Results) to see the project results and accuracy of the machine learning models.

I recently completed a machine learning project where I was able to track bicep curl form of a user. If the bicep curl form was correct, the model would indicate correct form. If the bicep curl form was incorrect, for example, if the user swung their elbow too much, the model would indicate that the form was incorrect and state why it was incorrect.

Note: “Form” refers to the movement performed during one repetition of a bicep curl. Having correct form when weightlifting is important in order to prevent injury and target the correct muscles.

The goal of this project was to develop proof of concept that I could build a machine learning model with sensor data from smartwatches to track subtle differences in movements during certain weightlifting exercises. In the future, I may expand on this project in order to develop a full fledged app.

Here’s how I did it:

Choosing a research topic

I have strong interests in data science and fitness, and am always looking for ways to combine the two topics. I noticed there was an increasing capability for wearables to track cardio exercise, but not so much for weightlifting. That’s when I came up with the idea to use wearables to track weightlifting form. Looking back, I would say it is important that you choose a topic that genuinely interests you, because when you run into hiccups and setbacks (and you will), it will give you motivation to get past those moments.

Research

Prior to this project, I didn’t have any experience working with sensor data, so I had to conduct a lot of research. In recent years, there have been many papers covering Human Activity Recognition (HAR), where researchers mainly looked at using sensor data from body sensors or smartphones to identify when users are conducting various activities such as standing, sitting, running, walking, and climbing stairs. The most common sensors used in these papers were an accelerometer and a gyroscope.

I looked at type of data used, the processes used to collect data, and how models were built and tested. I also took note of some of the main differences between these HAR papers and my research. These differences included more subtle differences in weightlifting form compared to the activities identified in the HAR papers, and my research was only going to use a wrist sensor instead of a waist sensor or multiple sensors attached throughout the body.

Based on my research, the models I ended up choosing to test for this project were a a Long Short Term Memory Network (LSTM), a 1-D Convolutional Neural Network (CNN), and a CNN-LSTM, which is a combination of the two previous models. An LSTM is a type of Recurrent Neural Network (RNN). RNNs are popular when dealing with sequence data, such as sensor data. An LSTM is a type of RNN that is good at retaining information throughout the whole sequence of data. A 1D-CNN can extract features from the raw sensor data, similar to how the more popular 2D-CNN is used when classifying image data. A CNN-LSTM uses the features extracted by the CNN and feeds that into the LSTM.

Data

To collect data, I designed an experiment by using participants to perform 3 variations of a bicep curl: one correct curl and two different types of incorrect curls. The incorrect curls were curls with the user’s elbow swinging too much and a curls where the user does not go through a full range of motion for the exercise. I collected the data by attaching a smartphone to the participants wrist and asking them to complete 250 repetitions of each of the 3 variations of curls (with breaks in between and them being monitored so the data would not be faulty). I used the Androsensor Android app to collect the accelerometer and gyroscope sensor data. The data was collected at a rate of 100Hz and I used a time period of 2 seconds for each bicep curl sequence, which means that there were 200 data points in each sequence of data. It is important to retain information from the beginning of the curl as the network makes its way throughout the sequence of data.

The data cleaning and data wrangling was done using Python, Numpy, and Pandas. A window of 2 seconds was used for each curl, with a sliding window of 0.5 seconds to create more data points. In order to create even more data points, I used starting points at 0.1, 0.2, 0.3, and 0.4 seconds into a curl to create more 2 second windows.

Training and Testing Models

The models were tested using a leave-one-out test. This test was used rather than a randomized split because the model should work on someone if it has never seen their data before. For example, in a group of 3 people, if the model was trained on person 1 and person 2’s data, the model should be accurate on person 3’s data without having been trained on it before.

The glossary for the below model architecture table is at the end of this post.

This table shows the model architectures of the models that were trained and tested using the leave-one-out test. — TABLE GLOSSARY IS AT END OF POST

All three models were trained and tested using various adjustments in parameters for each model. The model architectures can be seen in the table above. The training and testing were done using the Tensorflow and Keras machine learning libraries and on Google Colab. Google Colab was chosen because it has free uninterrupted GPU access for up to 12 hours at a time. The results of each of the models can be seen in the table below. The best performing model architecture for each model is in bold.

This table shows the average model performance for each of the models tested. The best performing models for each model type are in bold.

Results

Below you can see the final results for the best performing architectures for each model using accuracy and F1 score as measures. The final scores were averages of using each of the 3 participants as the test set for each model. Model size was also important to track for this project because mobile and wearable devices only have a limited amount of resources and processing power. Therefore, the smaller the model size, the better it is to use on mobile devices (best practice is usually having the data sent from a wearable to the mobile device and the mobile device does all the processing because it has more processing power than a wearable device).

This table shows the results of each training and testing instance of each of the top performing models.

This table shows the averaged results of the top performing models for each model type.

Glossary for Model Architecture Table

Conv1D(filters, kernel size) —1-Dimensional Convolutional layer; filters refers to the dimensionality of the output space; kernel size refers to the length of the convolution window.

LSTM(units) —LSTM layer; units refers to the dimensionality of the output space.

Dropout(dropout value) —Dropout layer; dropout value refers to the percentage of units to randomly remove in the specified layer.

MP(window size) —Max Pooling layer; window size refers to the size of the max pooling window.

Flatten —Flattens the input to one dimension.

FC(units) —Fully connected layer, also known as a dense layer; units refers to the dimensionality of the output space.

How I Used Machine Learning and Smartwatches to Track Weightlifting Activity