Regression Model for Kaggle Tabular Playground Series 2021 August Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series August 2021 dataset is a regression modeling where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a logarithmic loss of 7.9117. When we processed the test dataset with the final model, the model achieved a logarithmic loss of 7.9304.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series August 2021 Dataset

Dataset ML Model: Regression with numerical features

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-aug-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-aug-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Tabular Data Analytics Project Templates Using Python and AutoKeras Version 3

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the AutoKeras library.

Version 3 of the AutoKeras templates contain updated structures and code like the previous templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Tabular Data Analytics Project Templates Using Python and TensorFlow Version 9

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the TensorFlow library.

Version 9 of the TensorFlow templates contain updated structures and code like the previous TensorFlow templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Tabular Data Analytics Project Templates Using Python and XGBoost Version 3

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the XGBoost library.

Version 3 of the XGBoost templates contain updated structures and code like the previous XGBoost templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Tabular Data Analytics Project Template Using Python and Scikit-learn Version 16

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the Scikit-learn library.

Version 16 of the Scikit-learn templates contain updated structures and code like the previous Scikit-Learn templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Regression Analytics Project Template Using Python and TensorFlow Decision Forests Version 1

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and TensorFlow Decision Forests.

Version 1 of the Decision Forests template contains structures and features that are similar to the Scikit-Learn templates. I designed the Decision Forests template to address regression modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Regression Model for Kaggle Tabular Playground Series 2021 January Using TensorFlow Decision Forests

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series January 2021 dataset is a regression situation where we are trying to predict the value of a continuous variable.

Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have been hosting playground-style competitions on Kaggle with fun but less complex, tabular datasets. These competitions will be great for people looking for something between the Titanic Getting Started competition and a Featured competition.

ANALYSIS: The performance of the preliminary Gradient Boosted Trees model achieved an RMSE benchmark of 0.7013 on the validation dataset. The final model processed the validation dataset with a final RMSE score of 0.7006. When we applied the finalized model to Kaggle’s test dataset, the model achieved an RMSE score of 0.7031.

CONCLUSION: In this iteration, the TensorFlow Decision Forests model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series January 2021 Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jan-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-jan-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Regression Model for Kaggle Tabular Playground Series 2021 February Using TensorFlow Decision Forests

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series February 2021 dataset is a regression situation where we are trying to predict the value of a continuous variable.

Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The February dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: The performance of the preliminary Random Forest model achieved an RMSE benchmark of 0.8422 on the validation dataset. The final model processed the validation dataset with a final RMSE score of 0.8549. When we applied the finalized model to Kaggle’s test dataset, the model achieved an RMSE score of 0.8534.

CONCLUSION: In this iteration, the TensorFlow Decision Forests model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series Feb 2021 Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-feb-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-feb-2021/leaderboard

The HTML formatted report can be found here on GitHub.