Multi-Class Classification Model for Sensorless Drive Diagnosis Using Python Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Sensorless Drive Diagnosis is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The dataset contains features extracted from electric current drive signals. The drive has both intact and defective components. The signals can result in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, such as speeds, load moments, and load forces.

In iteration Take1, we established the baseline accuracy measurement for comparison with future rounds of modeling.

In this iteration, we will standardize the numeric attributes and observe the impact of scaling on modeling accuracy.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 84.65%. Two algorithms (Random Forest and Extra Trees) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 99.95%. After applying the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 99.97%, which was even better than the prediction from the training data.

In this iteration, the performance of the machine learning algorithms achieved an average accuracy of 94.74%. Two algorithms (Random Forest and Extra Trees) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 99.94%. After applying the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 99.97%, which was even better than the prediction from the training data.

By standardizing the dataset features, the ensemble algorithms continued to perform well. Moreover, the non-ensemble algorithms performed significantly better compared to iteration Take1. Standardizing the features appeared to have a positive impact on the overall modeling process.

CONCLUSION: For this iteration, the Extra Trees algorithm achieved the best overall training and validation results. For this dataset, Extra Trees could be considered for further modeling.

Dataset Used: Sensorless Drive Diagnosis Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis

The HTML formatted report can be found here on GitHub.