Binary Classification Model for Company Bankruptcy Prediction Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.

The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.

In iteration Take1, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

In iteration Take2, we constructed and tuned a XGBoost model. We also will observe the best results that we can obtain from the model.

This Take3 iteration will construct and tune a three-layer TensorFlow model. We also will observe the best results that we can obtain from the model.

ANALYSIS: In iteration Take1, the machine learning algorithms’ average performance achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.

In iteration Take2, the XGBoost algorithm achieved an F1 score of 96.48% using the training dataset. After a series of tuning trials, the XGBoost model turned in an F1 score of 98.38%. When we applied the XGBoost model to the previously unseen test dataset, we obtained an F1 score of 58.18%.

In this Take3 iteration, The performance of the TensorFlow model achieved an average F1 score of 67.03% after 20 epochs using the training dataset. When we applied the XGBoost model to the previously unseen test dataset, obtained an F1 score of 41.55%.

CONCLUSION: In this iteration, the TensorFlow model did not appear to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Company Bankruptcy Prediction Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

The HTML formatted report can be found here on GitHub.

NLP Model for IMDB Movie Sentiment Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The IMDB Movie Sentiment dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 50,000 movie reviews extracted from IMDB. The researchers have annotated the tweets with labels (0 = negative, 1 = positive) to detect the reviews’ sentiment.

From iteration Take1, we created a bag-of-words model to perform binary classification (positive or negative) for the Tweets. The Part A script focused on building the model with the training and validation datasets due to memory capacity constraints. Part B focused on testing the model with the training and test datasets.

In this Take2 iteration, we will create a word-embedding model to perform binary classification for the Tweets.

ANALYSIS: From iteration Take1, the preliminary model’s performance achieved an accuracy score of 88.80% on the validation dataset after ten epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.48%.

In this Take2 iteration, the preliminary model’s performance achieved an average accuracy score of 88.40% on the validation dataset after ten epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.66%.

CONCLUSION: In this iteration, the word-embedding TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: IMDB Movie Sentiment

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format

One potential source of performance benchmarks: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Textile Defect Detection Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Textile Defect Detection dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset from Kaggle contains 96,000 patches of the textile image with different quality problems. The goal of the exercise is to detect the quality issue for a patch of textile during production. The greyscale photos are part of the public dataset made available by the MVTec Company and referred by the research paper from Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger. MVTec AD – A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

From iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a separate test dataset.

From iteration Take2, we constructed a CNN model using the DenseNet201 architecture and tested the model’s performance using a separate test dataset.

In this Take3 iteration, we will construct a CNN model using the ResNet50V2 architecture and test the model’s performance using a separate test dataset.

ANALYSIS: From iteration Take1, the baseline model’s performance achieved an accuracy score of 97.07% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 68.03% on the test dataset after 15 epochs.

From iteration Take2, the DenseNet201 model’s performance achieved an accuracy score of 98.93% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 53.69% on the test dataset after 15 epochs.

In this Take3 iteration, the ResNet50V2 model’s performance achieved an accuracy score of 94.88% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 59.12% on the test dataset after 15 epochs.

CONCLUSION: In this iteration, the ResNet50V2 CNN model did not appear suitable for modeling this dataset due to a high-variance problem. We should consider experimenting with more or different data for further modeling.

Dataset Used: Textile Defect Detection

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/belkhirnacim/textiledefectdetection

A potential source of performance benchmarks: https://www.kaggle.com/belkhirnacim/textiledefectdetection

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Textile Defect Detection Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Textile Defect Detection dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset from Kaggle contains 96,000 patches of the textile image with different quality problems. The goal of the exercise is to detect the quality issue for a patch of textile during production. The greyscale photos are part of the public dataset made available by the MVTec Company and referred by the research paper from Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger. MVTec AD – A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

From iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a separate test dataset.

In this Take2 iteration, we will construct a CNN model using the DenseNet201 architecture and test the model’s performance using a separate test dataset.

ANALYSIS: From iteration Take1, the baseline model’s performance achieved an accuracy score of 97.07% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 68.03% on the test dataset after 15 epochs.

In this Take2 iteration, the DenseNet201 model’s performance achieved an accuracy score of 98.93% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 53.69% on the test dataset after 15 epochs.

CONCLUSION: In this iteration, the DenseNet201 CNN model did not appear suitable for modeling this dataset due to a high-variance problem. We should consider experimenting with more or different data for further modeling.

Dataset Used: Textile Defect Detection

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/belkhirnacim/textiledefectdetection

A potential source of performance benchmarks: https://www.kaggle.com/belkhirnacim/textiledefectdetection

The HTML formatted report can be found here on GitHub.

NLP Model for IMDB Movie Sentiment Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The IMDB Movie Sentiment dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains 50,000 movie reviews extracted from IMDB. The researchers have annotated the tweets with labels (0 = negative, 1 = positive) to detect the reviews’ sentiment.

In this Take1 iteration, we will create a bag-of-words model to perform binary classification (positive or negative) for the Tweets. The Part A script will focus on building the model with the training and validation datasets due to memory capacity constraints. Part B will focus on testing the model with the training and test datasets.

ANALYSIS: In this Take1 iteration, the preliminary model’s performance achieved an average accuracy score of 88.80% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.48%.

CONCLUSION: In this iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: IMDB Movie Sentiment

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format

One potential source of performance benchmarks: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Textile Defect Detection Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Textile Defect Detection dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset from Kaggle contains 96,000 patches of the textile image with different quality problems. The goal of the exercise is to detect the quality issue for a patch of textile during production. The greyscale photos are part of the public dataset made available by the MVTec Company and referred by the research paper from Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger. MVTec AD – A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

In this Take1 iteration, we will construct a CNN model using a simple three-block VGG architecture and test the model’s performance using a separate test dataset.

ANALYSIS: In this Take1 iteration, the baseline model’s performance achieved an accuracy score of 97.07% on the validation dataset after 15 epochs. Furthermore, the final model’s performance achieved an accuracy score of 68.03% on the test dataset after 15 epochs.

CONCLUSION: In this iteration, the simple three-block VGG CNN model did not appear suitable for modeling this dataset due to a high-variance problem. We should consider experimenting with more or different data for further modeling.

Dataset Used: Textile Defect Detection

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/belkhirnacim/textiledefectdetection

A potential source of performance benchmarks: https://www.kaggle.com/belkhirnacim/textiledefectdetection

The HTML formatted report can be found here on GitHub.

Binary-Class Image Classification Deep Learning Model for Malaria Parasite Detection Using TensorFlow Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Malaria Parasite Detection dataset is a binary-class classification situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Biomedical researchers have developed a mobile application that runs on a standard Android smartphone attached to a conventional light microscope for detecting malaria disease. The smartphone’s built-in camera acquired thin blood smear images of slides for each microscopic field of view. An expert manually annotated the slides afterward. The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.

From iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a held-out validation dataset.

From iteration Take2, we constructed a CNN model using the InceptionV3 architecture and tested the model’s performance using a held-out validation dataset.

From iteration Take3, we constructed a CNN model using the ResNet50V2 architecture and tested the model’s performance using a held-out validation dataset.

In this Take4 iteration, we will construct a CNN model using the DenseNet201 architecture and test the model’s performance using a held-out validation dataset.

ANALYSIS: From iteration Take1, the model’s performance achieved an average accuracy score of 94.08% on the validation dataset after 20 epochs.

From iteration Take2, the model’s performance achieved an average accuracy score of 95.12% on the validation dataset after 20 epochs.

From iteration Take3, the model’s performance achieved an average accuracy score of 95.19% on the validation dataset after 20 epochs.

In this Take4 iteration, the model’s performance achieved an average accuracy score of 95.41% on the validation dataset after 20 epochs.

CONCLUSION: In this iteration, the DenseNet201 CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Malaria Parasite Detection

Dataset ML Model: Binary-class image classification with numerical attributes

Dataset Reference: https://lhncbc.nlm.nih.gov/LHC-publications/pubs/MalariaDatasets.html

A potential source of performance benchmark: https://doi.org/10.7717/peerj.4568 or https://doi.org/10.7717/peerj.6977

One potential source of performance benchmarks: https://www.kaggle.com/c/cassava-leaf-disease-classification/leaderboard

The HTML formatted report can be found here on GitHub.

Binary-Class Image Classification Deep Learning Model for Malaria Parasite Detection Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Malaria Parasite Detection dataset is a binary-class classification situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Biomedical researchers have developed a mobile application that runs on a standard Android smartphone attached to a conventional light microscope for detecting malaria disease. The smartphone’s built-in camera acquired thin blood smear images of slides for each microscopic field of view. An expert manually annotated the slides afterward. The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.

From iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a held-out validation dataset.

From iteration Take2, we constructed a CNN model using the InceptionV3 architecture and tested the model’s performance using a held-out validation dataset.

In this Take3 iteration, we will construct a CNN model using the ResNet50V2 architecture and test the model’s performance using a held-out validation dataset.

ANALYSIS: From iteration Take1, the model’s performance achieved an average accuracy score of 94.08% on the validation dataset after 20 epochs.

From iteration Take2, the model’s performance achieved an average accuracy score of 95.12% on the validation dataset after 20 epochs.

In this Take3 iteration, the model’s performance achieved an average accuracy score of 95.19% on the validation dataset after 20 epochs.

CONCLUSION: In this iteration, the ResNet50V2 CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Malaria Parasite Detection

Dataset ML Model: Binary-class image classification with numerical attributes

Dataset Reference: https://lhncbc.nlm.nih.gov/LHC-publications/pubs/MalariaDatasets.html

A potential source of performance benchmark: https://doi.org/10.7717/peerj.4568 or https://doi.org/10.7717/peerj.6977

One potential source of performance benchmarks: https://www.kaggle.com/c/cassava-leaf-disease-classification/leaderboard

The HTML formatted report can be found here on GitHub.