Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Santander Bank Customer Transaction Prediction competition is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: Santander Bank’s data science team wants to identify which customers will make a specific transaction in the future, irrespective of the amount of money transacted. The bank is continually challenging its machine learning algorithms to make sure they can more accurately identify new ways to solve its most common challenges such as: Will a customer buy this product? Can a customer pay this loan?
For this iteration, we will examine the effectiveness of the eXtreme Gradient Boosting algorithm with the synthetic over-sampling technique (SMOTE) to mitigate the effect of imbalanced data for this problem. Submissions are evaluated on the area under the ROC curve between the predicted probability and the observed target.
ANALYSIS: We applied different values for the max_depth, learning_rate, and n_estimators parameters. The max_depth values vary from 3 to 6. The learning_rate values vary from 0.1 to 0.5. The n_estimators values vary from 1000 to 4000. The following output files are available for comparison.
CONCLUSION: To be determined after comparing the results from other machine learning algorithms.
Dataset Used: Santander Customer Transaction Prediction
Dataset ML Model: Binary classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/santander-customer-transaction-prediction/data
One potential source of performance benchmark: https://www.kaggle.com/c/santander-customer-transaction-prediction/overview
The HTML formatted report can be found here on GitHub.