Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Santander Bank Customer Transaction Prediction competition is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: Santander Bank’s data science team wants to identify which customers will make a specific transaction in the future, irrespective of the amount of money transacted. The bank is continually challenging its machine learning algorithms to make sure they can more accurately identify new ways to solve its most common challenges such as: Will a customer buy this product? Can a customer pay this loan?
For this iteration, we will examine the effectiveness of the Logistic Regression algorithm with the synthetic over-sampling technique (SMOTE) to mitigate the effect of imbalanced data for this problem. Submissions are evaluated on the area under the ROC curve between the predicted probability and the observed target.
ANALYSIS: The baseline performance achieved an average ROC-AUC score of 0.8765. After a series of tuning trials, the top result from the training data was a ROC-AUC score of 0.8788. By using the optimized parameters, the algorithm processed the test dataset with a ROC-AUC score of 0.7776.
CONCLUSION: To be determined after comparing the results from other machine learning algorithms.
Dataset Used: Santander Customer Transaction Prediction
Dataset ML Model: Binary classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/santander-customer-transaction-prediction/data
One potential source of performance benchmark: https://www.kaggle.com/c/santander-customer-transaction-prediction/overview
The HTML formatted report can be found here on GitHub.