Kaggle Competition: Banco Santander Customer Transaction Prediction

If you are new to Python machine learning like me, you might find the current Kaggle competition “Santander Customer Transaction Prediction” interesting.

The competition is essentially a binary classification problem with a decently large dataset (200 attributes and 200,000 rows of training data). I have not participated in Kaggle competition before and will use this one to get some learning under the belt.

I plan to run the training data through a list of machine learning algorithms (see below) and iterate them through three stages. This blog post will serve as the meta post that summarizes the progress.

The current plan with the milestones are as follow:

Stage 1: Gather the Baseline Performance.

  • LogisticRegression: targeted Monday 25 February 2019
  • DecisionTreeClassifier: targeted Wednesday 27 February 2019
  • KNeighborsClassifier: targeted Friday 1 March 2019
  • BaggingClassifier: targeted Monday 4 March 2019
  • RandomForestClassifier: targeted Wednesday 6 March 2019
  • ExtraTreesClassifier: targeted Friday 8 March 2019
  • GradientBoostingClassifier: TBD

Stage 2: Feature Selection using the Attribute Importance Ranking technique

  • LogisticRegression: TBD
  • DecisionTreeClassifier: TBD
  • KNeighborsClassifier: TBD
  • BaggingClassifier: TBD
  • RandomForestClassifier: TBD
  • ExtraTreesClassifier: TBD
  • GradientBoostingClassifier: TBD

Stage 2: Feature Selection using the Recursive Feature Elimination technique

  • LogisticRegression: TBD
  • DecisionTreeClassifier: TBD
  • KNeighborsClassifier: TBD
  • BaggingClassifier: TBD
  • RandomForestClassifier: TBD
  • ExtraTreesClassifier: TBD
  • GradientBoostingClassifier: TBD

I will post all Python script in a folder on GitHub. The final submission deadline is 10 April 2019.

Feel free to take a look at the scripts and experiment. Who knows, you might have something you can turn in by the time April comes around. Happy learning and good luck!