Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activity Recognition Using Smartphones dataset is a multi-class classification situation where we are trying to predict one of several (more than two) possible outcomes.
INTRODUCTION: Researchers collected the datasets from experiments that consist of a group of 30 volunteers, with each person performing six activities by wearing a smartphone on the waist. With its embedded accelerometer and gyroscope, the research captured measurement for the activities of WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING. The dataset has been randomly partitioned into two sets, where 70% of the volunteers were selected for generating the training data and 30% the test data.
In previous iterations, the script focused on evaluating various classic machine learning algorithms and identify the algorithm that produces the best accuracy metric. The previous iterations established a baseline performance in terms of accuracy and processing time.
In iteration Take1, we constructed and tuned an XGBoost machine learning model for this dataset. We also observed the best accuracy result that we could obtain using the XGBoost model with the training and test datasets.
In iteration Take2, we constructed several Multilayer Perceptron (MLP) models with one hidden layer. These simple MLP models would serve as a benchmark as we build more complex MLP models in future iterations.
In iteration Take3, we constructed several Multilayer Perceptron (MLP) models with two hidden layers. We also experimented with the dropout layers as a regularization technique for improving our models.
In this Take4 iteration, we will construct several Multilayer Perceptron (MLP) models with three hidden layers. We also will experiment with the dropout layers as a regularization technique for improving our models.
ANALYSIS: From iteration Take1, the XGBoost model achieved an accuracy metric of 99.45% in training. When configured with the optimized parameters, the XGBoost model processed the test dataset with an accuracy of 94.94%, which indicated a high variance issue. We will need to explore regularization techniques or other modeling approaches before deploying the model for production use.
From iteration Take2, the one-layer MLP models achieved an accuracy metric of between 98.8% and 99.3% after 50 epochs in training. Those same models processed the test datasets with an accuracy metric of between 93.0% and 95.9%.
From iteration Take3, the two-layer MLP models achieved an accuracy metric of between 96.2% and 98.5% after 50 epochs in training. Those same models processed the test datasets with an accuracy metric of between 93.6% and 96.2%.
For this Take4 iteration, the three-layer MLP models achieved an accuracy metric of between 90.0% and 98.7% after 50 epochs in training. Those same models processed the test datasets with an accuracy metric of between 92.1% and 95.9%.
- Three-layer 32/16/8 nodes: Training – 90.07% Testing – 92.97%
- Three-layer 64/32/16 nodes: Training – 96.12% Testing – 92.12%
- Three-layer 96/48/24 nodes: Training – 96.19% Testing – 95.55%
- Three-layer 128/64/32 nodes: Training – 97.83% Testing – 95.07%
- Three-layer 192/96/48 nodes: Training – 98.04% Testing – 95.48%
- Three-layer 256/128/64 nodes: Training – 97.37% Testing – 95.86%
- Three-layer 384/192/96 nodes: Training – 98.35% Testing – 95.07%
- Three-layer 512/256/128 nodes: Training – 98.20% Testing – 95.07%
- Three-layer 768/384/196 nodes: Training – 98.61% Testing – 95.28%
- Three-layer 1024/512/256 nodes: Training – 98.19% Testing – 95.31%
CONCLUSION: For this iteration, the three-layer MLP models produced mixed results with smaller but still noticeable variance. For this dataset, we will need to explore regularization techniques or other modeling approaches to reduce variance before deploying the model for production use.
Dataset Used: Human Activity Recognition Using Smartphones
Dataset ML Model: Multi-class classification with numerical attributes
The HTML formatted report can be found here on GitHub.