Image Regression Model for MNIST Handwritten Digits Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MNIST Handwritten Digits dataset is an image classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes, and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from many scanned document datasets available from the National Institute of Standards and Technology (NIST). Each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model, and a separate set of 10,000 images are used to test it. It is a digit recognition task, so there are ten classes (0 to 9) to predict.

ANALYSIS: Previously, we modeled the dataset using AutoKeras’ image classifier, and the system processed the validation dataset with an accuracy score of 94.84%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 98.4%.

After a series of modeling trials in this iteration, the AutoKeras’ image regressor system processed the test dataset with an RMSE score of 0.454 and an R2 score of 97.53%. When we applied the same predictions to the classification metrics, we obtained an accuracy score of 96.47%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MNIST Handwritten Digits Dataset

Dataset ML Model: Image regression modeling with numerical attributes

Dataset Reference: https://www.tensorflow.org/datasets/catalog/mnist

One potential source of performance benchmark: https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/

The HTML formatted report can be found here on GitHub.

Image Classification Model for MNIST Handwritten Digits Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MNIST Handwritten Digits dataset is an image classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes, and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from many scanned document datasets available from the National Institute of Standards and Technology (NIST). Each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model, and a separate set of 10,000 images are used to test it. It is a digit recognition task, so there are ten classes (0 to 9) to predict.

ANALYSIS: After a series of modeling trials, the AutoKeras system processed the validation dataset with an accuracy score of 94.84%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 98.4%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MNIST Handwritten Digits Dataset

Dataset ML Model: Image regression modeling with numerical attributes

Dataset Reference: https://www.tensorflow.org/datasets/catalog/mnist

One potential source of performance benchmark: https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/

The HTML formatted report can be found here on GitHub.

Binary Classification Model for MiniBooNE Particle Identification Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The researchers set up the data file as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

ANALYSIS: In another TensorFlow modeling exercise, the baseline model (2 layers with 32 nodes each) achieved an accuracy score of 95.17% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model (2 layers with 512 nodes each) processed the validation dataset with an accuracy score of 97.88%. Furthermore, the final model processed the previously unseen test dataset with an accuracy score of 94.40%.

After a series of modeling trials, the best AutoKeras model (2 layers with 256 and 32 nodes) processed the validation dataset with a maximum accuracy score of 94.64%. When we applied the AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 94.54%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MiniBooNE Particle Identification Dataset

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.

Regression Model for White Wine Quality Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.726. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.714. Furthermore, the final model processed the test dataset with an RMSE of 0.693.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.562. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.623.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

Regression Model for Red Wine Quality Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.663. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.643. Furthermore, the final model processed the test dataset with an RMSE of 0.679.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.386. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.602.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

Multi-Class Model for Faulty Steel Plates Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Faulty Steel Plates dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the study was to correctly classify the type of surface defects in stainless steel plates, with six kinds of possible defects (plus “other”). The Input vector was made up of 27 indicators that approximately the geometric shape of the fault and its outline.

From another previous modeling iteration, the performance of a three-layer TensorFlow model achieved an average accuracy score of 72.51%. After tuning the hyperparameters, the best model processed the training dataset with an accuracy of 74.77%. Furthermore, the final three-layer model processed the test dataset with an accuracy of 74.28%.

ANALYSIS: After a series of modeling trials, the AutoKeras system processed the validation dataset with a maximum accuracy score of 78.64%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 76.60%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Faulty Steel Plates Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: http://archive.ics.uci.edu/ml/datasets/steel+plates+faults

One potential source of performance benchmark: https://www.kaggle.com/uciml/faulty-steel-plates

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Credit Card Default Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Credit Card Default dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains information on default payments, demographic factors, credit data, payment history, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

ANALYSIS: After a series of modeling trials, the AutoKeras system processed the validation dataset with a maximum accuracy score of 81.98%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 81.25%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Default of Credit Card Clients Dataset

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

The HTML formatted report can be found here on GitHub.

Regression Modeling Template Using Python and AutoKeras Version 1

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

The purpose of this modeling exercise is to construct an end-to-end template for solving machine learning problems. This Python script will adapt Dr. Jason Brownlee’s blog post on this topic and build a robust template for solving similar problems.

Version 1 of the AutoKeras regression template contains structures and features that are like the Scikit-Learn templates. I pull together this template to take a machine learning exercise from beginning to end.

You will find the Python templates on the Machine Learning Project Templates page.