Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
Dataset Used: Faulty Steel Plates
Dataset ML Model: Multi-Class classification with numerical attributes
Dataset Reference: http://archive.ics.uci.edu/ml/datasets/steel+plates+faults
One potential source of performance benchmarks: https://www.kaggle.com/uciml/faulty-steel-plates
INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the defect and its outline. According to the research paper, Semeion was commissioned by the Centro Sviluppo Materiali (Italy) for this task and therefore it is not possible to provide details on the nature of the 27 indicators used as Input vectors or the types of the 6 classes of defects.
CONCLUSION: The baseline performance of the seven algorithms achieved an average accuracy of 69.69%. Three algorithms (Bagged CART, Random Forest, and Stochastic Gradient Boosting) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the top result using the training data. It achieved an average accuracy of 77.78%. Using the optimized tuning parameter available, the Stochastic Gradient Boosting algorithm processed the validation dataset with an accuracy of 77.20%, which was slightly below the accuracy of the training data. For this project, the Stochastic Gradient Boosting ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.
The HTML formatted report can be found here on GitHub.