Image Regression Model for MNIST Handwritten Digits Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The MNIST Handwritten Digits dataset is an image classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes, and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from many scanned document datasets available from the National Institute of Standards and Technology (NIST). Each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model, and a separate set of 10,000 images are used to test it. It is a digit recognition task, so there are ten classes (0 to 9) to predict.

ANALYSIS: Previously, we modeled the dataset using AutoKeras’ image classifier, and the system processed the validation dataset with an accuracy score of 94.84%. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an accuracy score of 98.4%.

After a series of modeling trials in this iteration, the AutoKeras’ image regressor system processed the test dataset with an RMSE score of 0.454 and an R2 score of 97.53%. When we applied the same predictions to the classification metrics, we obtained an accuracy score of 96.47%.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: MNIST Handwritten Digits Dataset

Dataset ML Model: Image regression modeling with numerical attributes

Dataset Reference: https://www.tensorflow.org/datasets/catalog/mnist

One potential source of performance benchmark: https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/

The HTML formatted report can be found here on GitHub.

Regression Model for White Wine Quality Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.726. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.714. Furthermore, the final model processed the test dataset with an RMSE of 0.693.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.562. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.623.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

Regression Model for Red Wine Quality Using Python and AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Wine Quality dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The dataset is related to the white variants of the Portuguese “Vinho Verde” wine. The problem is to predict the wine quality using the chemical characteristics of the wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g., there is no data about grape types, wine brand, wine selling price).

ANALYSIS: In another iteration of modeling with TensorFlow, the performance of the preliminary model achieved an RMSE of 0.663. After tuning the hyperparameters, the best model processed the training dataset with an RMSE of 0.643. Furthermore, the final model processed the test dataset with an RMSE of 0.679.

After a series of modeling trials, the AutoKeras system processed the validation dataset with a minimum RMSE score of 0.386. When we applied the best AutoKeras model to the previously unseen test dataset, we obtained an RMSE score of 0.602.

CONCLUSION: In this iteration, the best TensorFlow model generated by AutoKeras appeared to be suitable for modeling this dataset. We should consider experimenting with AutoKeras for further modeling.

Dataset Used: Wine Quality Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/wine+quality

The HTML formatted report can be found here on GitHub.

Regression Model for Superconductor Critical Temperature Using Python and TensorFlow Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

From iteration Take1, we constructed and tuned machine learning models for this dataset using TensorFlow with five layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets.

From iteration Take2, we constructed and tuned machine learning models for this dataset using TensorFlow with dropout layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets.

From iteration Take3, we constructed and tuned a TensorFlow model with five layers using the additional material attributes available for modeling. Furthermore, we also applied the tuned model to a test dataset and observed the best result that we could obtain from the model.

In this Take4 iteration, we will construct and tune a TensorFlow model with dropout layers using the additional material attributes available for modeling. Furthermore, we will apply the tuned model to a test dataset and observe the best result that we can obtain from the model.

ANALYSIS: From iteration Take1, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 11.109. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.564. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.540.

From iteration Take2, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 10.580. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.905. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.885.

From iteration Take3, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 12.298. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.299. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.144.

In this Take4 iteration, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 10.304. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 11.048. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.476.

CONCLUSION: In this iteration, the TensorFlow model with the dropout layers did not appear to have a noticeable effect on the modeling of this dataset. However, we still should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.

Regression Model for Superconductor Critical Temperature Using Python and TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

From iteration Take1, we constructed and tuned machine learning models for this dataset using TensorFlow with five layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets.

From iteration Take2, we constructed and tuned machine learning models for this dataset using TensorFlow with dropout layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets.

In this Take3 iteration, we will construct and tune a TensorFlow model with five layers using the additional material attributes available for modeling. Furthermore, we will apply the tuned model to a test dataset and observe the best result that we can obtain from the model.

ANALYSIS: From iteration Take1, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 11.109. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.564. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.540.

From iteration Take2, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 10.580. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.905. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.885.

In this Take3 iteration, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 12.298. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.299. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.144.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.

Regression Model for Superconductor Critical Temperature Using Python and TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

From iteration Take1, we constructed and tuned machine learning models for this dataset using TensorFlow with five layers. We also observed the best result that we could obtain using the tuned models with the validation and test datasets.

In this Take2 iteration, we will construct and tune machine learning models for this dataset using TensorFlow with dropout layers. We will observe the best result that we can obtain using the tuned models with the validation and test datasets.

ANALYSIS: From iteration Take1, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 11.109. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.564. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.540.

In this Take2 iteration, the baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 10.580. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.905. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.885.

CONCLUSION: In this iteration, the TensorFlow model with the dropout layers did not appear to have a noticeable effect on the modeling of this dataset. However, we still should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.

Regression Model for Superconductor Critical Temperature Using Python and TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

In this Take1 iteration, we will construct and tune machine learning models for this dataset using TensorFlow with five layers. We will observe the best result that we can obtain using the tuned models with the validation and test datasets.

ANALYSIS: The baseline performance of the TensorFlow algorithm achieved an RMSE benchmark of 11.109. After a series of tuning trials, the TensorFlow model processed the validation dataset with an RMSE score of 10.564. When we applied the TensorFlow model to the previously unseen test dataset, we obtained an RMSE score of 10.540.

CONCLUSION: In this iteration, the TensorFlow model appeared to be a suitable algorithm for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.

Regression Model for Superconductor Critical Temperature Using XGBoost Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Superconductor Critical Temperature dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: The research team wishes to create a statistical model for predicting the superconducting critical temperature based on the features extracted from the superconductor’s chemical formula. The model seeks to examine the features that can contribute the most to the model’s predictive accuracy.

From previous iterations, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

From iteration Take1, we constructed and tuned an XGBoost model. Furthermore, we applied the XGBoost model to a test dataset and observed the best result that we could obtain from the model.

In this Take2 iteration, we will construct and tune an XGBoost model using the additional material attributes available for modeling. Furthermore, we will apply the XGBoost model to a test dataset and observe the best result that we can obtain from the model.

ANALYSIS: From previous iterations, the Extra Trees model turned in the best overall result and achieved an RMSE metric of 9.56. By using the optimized parameters, the Extra Trees algorithm processed the test dataset with an RMSE of 9.32.

From iteration Take1, the baseline performance of the XGBoost algorithm achieved an RMSE benchmark of 12.88. After a series of tuning trials, the XGBoost model processed the validation dataset with an RMSE score of 9.88. When we applied the XGBoost model to the previously unseen test dataset, we obtained an RMSE score of 9.06.

In this Take2 iteration, the baseline performance of the XGBoost algorithm achieved an RMSE benchmark of 12.54. After a series of tuning trials, the XGBoost model processed the validation dataset with an RMSE score of 9.58. When we applied the XGBoost model to the previously unseen test dataset, we obtained an RMSE score of 8.94.

CONCLUSION: In this iteration, the additional material attributes improved the XGBoost model further for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Superconductivity Data Set

Dataset ML Model: Regression with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

One potential source of performance benchmarks: https://doi.org/10.1016/j.commatsci.2018.07.052

The HTML formatted report can be found here on GitHub.