Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Allstate Claims Severity dataset is a regression situation where we are trying to predict the value of a continuous variable.
INTRODUCTION: Allstate is interested in developing automated methods of predicting the cost, and hence severity, of claims. In this Kaggle challenge, the contestants were asked to create an algorithm that could accurately predict claims severity. Each row in this dataset represents an insurance claim. The task is to predict the value for the ‘loss’ column. Variables prefaced with ‘cat’ are categorical, while those prefaced with ‘cont’ are continuous.
In this iteration, we will construct machine learning models using the original dataset and with minimum data preparation and no feature engineering. This model will serve as the baseline for the future iterations of modeling.
ANALYSIS: The baseline performance of the machine learning algorithms achieved an average MAE of 1301. eXtreme Gradient Boosting (XGBoost) achieved the top MAE metric after the first round of modeling. After a series of tuning trials, XGBoost achieved a MAE metric of 1199. By using the optimized parameters, the XGBoost algorithm processed the test dataset with a MAE of 1204, which was in line with the MAE prediction from the training data.
CONCLUSION: For this iteration, the XGBoost algorithm achieved the best overall results using the training and testing datasets. For this dataset, XGBoost should be considered for further modeling.
Dataset Used: Allstate Claims Severity Data Set
Dataset ML Model: Regression with numerical and categorical attributes
Dataset Reference: https://www.kaggle.com/c/allstate-claims-severity/data
One potential source of performance benchmarks: https://www.kaggle.com/c/allstate-claims-severity/leaderboard
The HTML formatted report can be found here on GitHub.