Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Springleaf Marketing Response dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: Springleaf leverages the direct mail method for connecting with customers who may need a loan. To improve their targeting efforts, Springleaf must be sure they are focusing on the customers who are likely to respond and be good candidates for their services. Using a dataset with a broad set of anonymized features, Springleaf is looking to predict which customers will respond to a direct mail offer.
In iteration Take1, we constructed several traditional machine learning models using the linear, non-linear, and ensemble techniques. We also observed the best ROC-AUC result that we could obtain with each of these models.
In iteration Take2, we constructed and tuned an XGBoost machine learning model for this dataset. We also observed the best ROC-AUC result that we could obtain with the XGBoost model.
In iteration Take3, we constructed several Multilayer Perceptron (MLP) models with one hidden layer of 64, 128, 256, 512, 1024, and 2048 nodes. These single-layer MLP models serve as the baseline models as we build more complex MLP models in future iterations.
In iteration Take4, we constructed several Multilayer Perceptron (MLP) models with two hidden layers. We also observed whether these two-layer MLP models could improve the AUC-ROC performance of the single-layer models.
In this Take5 iteration, we will construct several Multilayer Perceptron (MLP) models with three hidden layers. We will observe whether these three-layer MLP models can improve the AUC-ROC performance of the single-layer models.
ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average ROC-AUC of 70.42%. The Random Forest and Gradient Boosting Machine algorithms made the top ROC-AUC metrics after the first round of modeling. After a series of tuning trials, GBM turned in an overall ROC-AUC result of 77.96%. When we apply the tuned GBM algorithm to the test dataset, we obtained a ROC-AUC score of only 62.58%, which was much lower than the score from model training.
In iteration Take2, the XGBoost algorithm achieved a baseline ROC-AUC performance of 77.08%. After a series of tuning trials, XGBoost turned in an overall best ROC-AUC result of 78.23%. When we apply the tuned XGBoost algorithm to the test dataset, we obtained a ROC-AUC score of only 62.86%, which was much lower than the score from model training.
In iteration Take3, all one-layer models achieved a ROC-AUC performance of around 50%.
In iteration Take4, all two-layer models again achieved a ROC-AUC performance of around 50%.
In this Take5 iteration, all three-layer models once again achieved a ROC-AUC performance of around 50%.
CONCLUSION: For this iteration, all three-layer models scored poorly on the ROC-AUC performance. For this dataset, we should consider doing MLP modeling with more neural network layers or with more complex architecture.
Dataset Used: Springleaf Marketing Response Data Set
Dataset ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: https://www.kaggle.com/c/springleaf-marketing-response/data
One potential source of performance benchmark: https://www.kaggle.com/c/springleaf-marketing-response/leaderboard
The HTML formatted report can be found here on GitHub.