Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
For more information on this case study project, please consult Dr. Brownlee’s blog post at https://machinelearningmastery.com/standard-machine-learning-datasets/.
Dataset Used: Connectionist Bench (Sonar, Mines vs. Rocks) Data Set
ML Model: Classification, numeric inputs
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar%2C+Mines+vs.+Rocks%29
The Sonar Dataset involves the prediction of whether or not an object is a mine or a rock given the strength of sonar returns at different angles. It is a binary (2-class) classification problem.
CONCLUSION: The baseline performance of predicting the most prevalent class achieved an accuracy of approximately 76.0%. Top results achieved via SVM was approximately 85.06% after a series of tuning. The RandomForest ensemble algorithm, also after tuning, yielded an accuracy of 85.09%. The very slight improvement between RF and SVM was too small to justify the additional processing and tuning required by the ensemble algorithm.
The purpose of this project is to analyze a dataset using various machine learning algorithms and to document the steps using a template. The project aims to touch on the following areas:
- Document a regression predictive modeling problem end-to-end.
- Explore data transformation options for improving model performance
- Explore algorithm tuning techniques for improving model performance
- Explore using and tuning ensemble methods for improving model performance
The HTML formatted report can be found here on GitHub.