Multi-Class Model for Human Activity Recognition Using Scikit-Learn Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activity Recognition Using Smartphones dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.

INTRODUCTION: Researchers collected the datasets from experiments that consist of a group of 30 volunteers, with each person performing six activities by wearing a smartphone on the waist. With its embedded accelerometer and gyroscope, the research captured measurement for the activities of WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING. The dataset has been randomly partitioned into two sets, where 70% of the volunteers were selected for generating the training data and 30% the test data.

In previous iterations, the script focused on evaluating various classic machine learning algorithms and identify the algorithm that produces the best accuracy metric. The previous iterations established a baseline performance in terms of accuracy and processing time.

In this Take1 iteration, we will construct and tune an XGBoost machine learning model for this dataset. We will observe the best accuracy result that we can obtain using the XGBoost model with the training and test datasets.

ANALYSIS: For this Take1 iteration, the XGBoost model achieved an accuracy metric of 99.45% in training. When configured with the optimized parameters, the XGBoost model processed the test dataset with an accuracy of 94.94%, which indicated a high variance issue. We will need to explore regularization techniques or other modeling approaches before deploying the model for production use.

CONCLUSION: For this iteration, the XGBoost algorithm achieved the best overall results using the training and test datasets. For this dataset, Random Forest should be considered for further modeling.

Dataset Used: Human Activity Recognition Using Smartphones

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference:

The HTML formatted report can be found here on GitHub.