Regression Model for Kaggle Tabular Playground Series 2021 Apr Using Python and TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Apr 2021 dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on the real Titanic dataset and generated using a CTGAN. The statistical properties of this dataset are very similar to the original Titanic dataset, but there is no shortcut to cheat by using public labels for predictions.

ANALYSIS: The performance of the cross validated TensorFlow models achieved an average accuracy benchmark of 0.7689 after running for 15 epochs. When we applied the final model to Kaggle’s test dataset, the model achieved an accuracy score of 0.7831.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Apr Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-apr-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-apr-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Kaggle Tabular Playground Series 2021 Mar Using Python and TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Mar 2021 dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: The performance of the cross-validated TensorFlow models achieved an average ROC benchmark of 0.8842 after running for 15 epochs. When we applied the final model to Kaggle’s test dataset, the model achieved a ROC score of 0.8861.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Mar Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-mar-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-mar-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Deepmind 3D Shapes Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Deepmind 3D Shapes dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: 3dshapes is a dataset of 3D shapes procedurally generated from six independent latent factors. These factors are floor color, wall color, object color, scale, shape, and orientation. By varying just one latent factor at a time, the researchers were able to generate 480000 images for the dataset.

In this Take1 iteration, we will construct a simple three-layer CNN model to predict the shape in each image.

ANALYSIS: The performance of the baseline model achieved an accuracy score of 99.91% after five epochs using the training dataset. The same model processed the validation dataset with an accuracy score of 99.95%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Deepmind 3D Shapes dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/deepmind/3d-shapes

One potential source of performance benchmarks: https://proceedings.mlr.press/v80/kim18b.html

The HTML formatted report can be found here on GitHub.

Regression Model for Kaggle Tabular Playground Series 2021 Feb Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series 2021 Feb dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The February dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS:  The performance of the best, preliminary AutoKeras model achieved an RMSE benchmark of 0.8625. When we applied the final model to Kaggle’s test dataset, the model achieved an RMSE score of 0.8648.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Feb Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-feb-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-feb-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Regression Model for Kaggle Tabular Playground Series 2021 Feb Using Python and TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series 2021 Feb dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The February dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The February dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: The performance of the cross-validated TensorFlow models achieved an average RMSE benchmark of 0.8642. When we applied the final model to Kaggle’s test dataset, the model achieved an RMSE score of 0.8642.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Feb Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-feb-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-feb-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Regression Model for Kaggle Tabular Playground Series 2021 Jan Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series 2021 Jan dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have been hosting playground-style competitions on Kaggle with fun but less complex, tabular datasets. These competitions will be great for people looking for something between the Titanic Getting Started competition and a Featured competition.

ANALYSIS: The performance of the best, preliminary AutoKeras model achieved an RMSE benchmark of 0.7084. When we applied the final model to Kaggle’s test dataset, the model achieved an RMSE score of 0.7092.

CONCLUSION: In this iteration, the TensorFlow model from AutoKeras appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Jan Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jan-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-jan-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Regression Model for Kaggle Tabular Playground Series 2021 Jan Using Python and TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series 2021 Jan dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have been hosting playground-style competitions on Kaggle with fun but less complex, tabular datasets. These competitions will be great for people looking for something between the Titanic Getting Started competition and a Featured competition.

ANALYSIS: The performance of the cross-validated TensorFlow models achieved an average RMSE benchmark of 0.7171. When we applied the final model to Kaggle’s test dataset, the model achieved an RMSE score of 0.7159.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground Series 2021 Jan Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jan-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-jan-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Binary-Class Image Classification Deep Learning Model for PatchCamelyon Grand Challenge Using TensorFlow Take 5

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The PatchCamelyon Grand Challenge dataset is a binary-class classification situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating the presence of metastatic tissue. This dataset provides a useful benchmark for machine learning models that are bigger than CIFAR10 but smaller than ImageNet.

In iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a held-out test dataset.

In iteration Take2, we constructed a CNN model using the InceptionV3 architecture and tested the model’s performance using a held-out test dataset.

In iteration Take3, we constructed a CNN model using the ResNet50 architecture and tested the model’s performance using a held-out test dataset.

In iteration Take4, we constructed a CNN model using the DenseNet121 architecture and tested the model’s performance using a held-out test dataset.

In this Take5 iteration, we will construct a CNN model using the MobileNetV3Small architecture and test the model’s performance using a held-out test dataset.

ANALYSIS: In iteration Take1, the baseline model’s performance achieved an accuracy score of 79.83% on the validation dataset after ten epochs. After we apply the final model to the test dataset, the model achieved an accuracy score of 79.00%.

In iteration Take2, the InceptionV3 model’s performance achieved an accuracy score of 83.74% on the validation dataset after ten epochs. After we apply the final model to the test dataset, the model achieved an accuracy score of 79.00%.

In iteration Take3, the ResNet50 model’s performance achieved an accuracy score of 85.09% on the validation dataset after ten epochs. After we apply the final model to the test dataset, the model achieved an accuracy score of 78.05%.

In iteration Take4, the DenseNet121 model’s performance achieved an accuracy score of 85.62% on the validation dataset after ten epochs. After we apply the final model to the test dataset, the model achieved an accuracy score of 80.01%.

In this Take5 iteration, the MobileNetV3Small model’s performance achieved an accuracy score of 82.63% on the validation dataset after ten epochs. After we apply the final model to the test dataset, the model achieved an accuracy score of 78.34%.

CONCLUSION: In this iteration, the MobileNetV3Small CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: PatchCamelyon Grand Challenge

Dataset ML Model: Binary-class image classification with numerical attributes

Dataset Reference: https://patchcamelyon.grand-challenge.org/

A potential source of performance benchmarks: https://patchcamelyon.grand-challenge.org/evaluation/challenge/leaderboard/

The HTML formatted report can be found here on GitHub.