Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Steel Plates Faults dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the defect and its outline. According to the research paper, Semeion was commissioned by the Centro Sviluppo Materiali (Italy) for this task, and therefore it is not possible to provide details on the nature of the 27 indicators used as Input vectors or the types of the six classes of defects.
ANALYSIS: The Random Forest model performed the best with the training dataset. The model achieved an average accuracy benchmark of 78.52% using the 10-fold cross-validation method.
CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.
Dataset Used: Steel Plates Faults
Dataset ML Model: Multi-Class classification with numerical features
Dataset Reference: https://archive-beta.ics.uci.edu/ml/datasets/steel+plates+faults
One source of potential performance benchmarks: https://www.kaggle.com/uciml/faulty-steel-plates
The HTML formatted report can be found here on GitHub.