Multi-Class Classification Model for Human Activities and Postural Transitions Using Python Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activities and Postural Transitions dataset is a classic multi-class classification situation where we are trying to predict one of the 12 possible outcomes.

INTRODUCTION: The research team carried out experiments with a group of 30 volunteers who performed a protocol of activities composed of six basic activities. There are three static postures (standing, sitting, lying) and three dynamic activities (walking, walking downstairs and walking upstairs). The experiment also included postural transitions that occurred between the static postures. These are stand-to-sit, sit-to-stand, sit-to-lie, lie-to-sit, stand-to-lie, and lie-to-stand. All the participants were wearing a smartphone on the waist during the experiment execution. The research team also video-recorded the activities to label the data manually. The research team randomly partitioned the obtained data into two sets, 70% for the training data and 30% for the testing.

In iteration Take1, the script focused on evaluating various machine learning algorithms and identifying the model that produces the best overall metrics. Because the dataset has many attributes that were collinear with other attributes, we eliminated the attributes that have a collinearity measurement of 99% or higher. Iteration Take1 established the performance baseline for accuracy and processing time.

In iteration Take2, we examined the feature selection technique of eliminating collinear features. We performed iterative modeling at collinear levels of 75%, 80%, 85%, 90%, and 95%. By eliminating the collinear features, we decreased the processing time and maintained a comparable level of model accuracy comparing to iteration Take1.

In iteration Take3, we examined the feature selection technique of attribute importance ranking by using the Random Forest algorithm. By selecting only the most important attributes, we hoped to decrease the processing time and to maintain a similar level of accuracy compared to iteration Take1.

In the current iteration Take4, we will examine the feature selection technique of Recursive Feature Elimination by using the Linear Discriminant Analysis algorithm. By limiting to only the 300 most relevant attributes, we hope to decrease the processing time and maintain a similar level of accuracy compared to iteration Take1.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 88.52%. Two algorithms (Linear Discriminant Analysis and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top overall result and achieved an accuracy metric of 94.19%. By using the optimized parameters, the Linear Discriminant Analysis algorithm processed the testing dataset with an accuracy of 94.71%, which was even better than the training data.

From the model-building perspective, the number of attributes decreased by 108, from 561 down to 453.

In iteration Take2, the baseline performance of the machine learning algorithms achieved an average accuracy of 88.04%. Two algorithms (Linear Discriminant Analysis and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top overall result and achieved an accuracy metric of 92.32%. By using the optimized parameters, the Linear Discriminant Analysis algorithm processed the testing dataset with an accuracy of 93.89%, which was even better than the training data.

From the model-building perspective, the number of attributes decreased by 278, from 561 down to 283. The processing time went from 7 hours 3 minutes in iteration Take1 down to 4 hours 31 minutes in Take2, which was a reduction of 35.9%.

In iteration Take3, the baseline performance of the machine learning algorithms achieved an average accuracy of 89.02%. Two algorithms (Linear Discriminant Analysis and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top overall result and achieved an accuracy metric of 94.42%. By using the optimized parameters, the Linear Discriminant Analysis algorithm processed the testing dataset with an accuracy of 94.97%, which was even better than the training data.

From the model-building perspective, the number of attributes decreased by 107, from 561 down to 454. The processing time went from 7 hours 3 minutes in iteration Take1 down to 6 hours 06 minutes in Take3, which was a reduction of 13.4%.

In the current iteration Take4, the baseline performance of the machine learning algorithms achieved an average accuracy of 87.22%. Two algorithms (Linear Discriminant Analysis and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top overall result and achieved an accuracy metric of 91.45%. By using the optimized parameters, the Linear Discriminant Analysis algorithm processed the testing dataset with an accuracy of 91.42%, which was even better than the training data.

From the model-building perspective, the number of attributes decreased by 261, from 561 down to 300. The processing time went from 7 hours 3 minutes in iteration Take1 down to 4 hours 57 minutes in Take4, which was a reduction of 29.7%.

CONCLUSION: For this iteration, the Recursive Feature Elimination technique and the Linear Discriminant Analysis algorithm achieved the best overall results while reducing the processing time. For this dataset, we should consider using the Linear Discriminant Analysis algorithm for further modeling or production use.

Dataset Used: Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions

The HTML formatted report can be found here on GitHub.