Multi-Class Image Classification Deep Learning Model for Cassava Leaf Disease Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Cassava Leaf Disease dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: As the second-largest provider of carbohydrates in Africa, cassava is an essential food security crop grown by smallholder farmers because it can withstand harsh conditions. Existing disease detection methods require farmers to solicit government-funded agricultural experts’ help to visually inspect and diagnose the plants. This method suffers from being labor-intensive, low-supply, and costly.

The research team compiled a dataset of 21,367 labeled images collected during a regular survey in Uganda to address the problem. Most pictures were crowdsourced from farmers taking photos of their gardens and annotated by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala. Our task is to classify each cassava image into four disease categories or a fifth category indicating a healthy leaf.

In this Take1 iteration, we will construct a CNN model using the InceptionV3 architecture and test the model’s performance using cross-validation. Also, we will submit our model to Kaggle and test the model’s performance using Kaggle’s test images.

ANALYSIS: In this Take1 iteration, the model’s performance achieved an average accuracy score of 67.17% on the validation dataset after 30 epochs. Furthermore, the final model processed Kaggle’s test dataset with an accuracy measurement of 61.25%.

CONCLUSION: In this iteration, the InceptionV3 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Cassava Leaf Disease Classification

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/cassava-leaf-disease-classification/

One potential source of performance benchmarks: https://www.kaggle.com/c/cassava-leaf-disease-classification/leaderboard

The HTML formatted report can be found here on GitHub.

Annie Duke on How to Decide, Part 10

In her book, How to Decide: Simple Tools for Making Better Choices, Annie Duke discusses how to train our brains to combat our own bias and help ourselves make more confident and better decisions.

These are some of my favorite concepts and takeaways from reading the book.

Chapter 8, “The Power of Negative Thinking”

In this chapter, Annie Duke discusses how we can improve our decision-making skills by applying various techniques such as premortem, backcasting, precommitment contracts, and category decisions. She offers the following recommendations:

  • Precommitting to Your Good Intentions:
    • A precommitment contract is an agreement that commits us in advance to take or refrain from specific actions. The precommitment contract also spells out the raising or lowering barriers to those actions.
  • The Dr. Evil Game:
    • Imagine a positive goal and any decisions that will guarantee to turn the positive outcome into failure. Also, imagine any given instance of that type of poor judgment with good enough rationale as its cloak.
    • When we identify a category of poor decisions that will be hard to spot except in aggregate, we can plan what options we can or cannot choose.
  • The Surprise Party No One Wants:
    • People can compound adverse outcomes by making poor decisions after a bad result. We need to be prepared for our reactions to setbacks along the way.
    • The term “Tilt” is a common reaction after a bad outcome causes us to be in an emotionally unstable state that compromises our decision-making quality.
    • There are three ways to prepare for setbacks. First, do the hard work to identify bad outcomes and potential mitigation tactics in advance. Second, learn to recognize the signs that we are on tilt to avoid compounding bad decisions. Third, we can establish a precommitment contract that we will take in the wake of bad outcomes.
  • Deflecting the Slings and Arrows of Outrageous Fortune:
    • Although we cannot control luck, we can do things in advance to soften the impact of bad luck. These mitigation steps are called hedges.
    • A hedge has three key features. First, a hedge reduces the impact of bad luck when it occurs. Second, a hedge has a cost. Finally, a hedge is something we hope never to need to use.

我只是在做我的工作

(從我一個尊敬的作家,賽斯·高汀

但是,如果不是那樣呢?

如果您能用“改進”,“重塑”或“變革”來代替“做工作”?

通常當我們完成我們的工作時,會發生什麼? 這工作會消失,明天再由無盡的任務清單來取代嗎?

如果我們能用足夠的信心和信任來重新考慮我們做事的含義又會怎樣?

NLP Model for Large Movie Review Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Large Movie Review dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The Large Movie Review Dataset is a collection of movie reviews used in the research paper “Learning Word Vectors for Sentiment Analysis” by Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts, The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011). The dataset comprises 25,000 highly polar movie reviews for training and 25,000 for testing.

This Take1 iteration will construct a bag-of-words model and analyze it with a simple TensorFlow deep learning network. Due to the system’s memory limitation, we had to break up the script processing into two parts. Part A will test the model with the training dataset using a five-fold validation. Part B will train the model with the entire training dataset and make predictions on a previously unseen test dataset.

ANALYSIS: In this Take1 iteration, the baseline model’s performance achieved an average accuracy score of 87.18% after 20 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 85.24%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Large Movie Review Dataset

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.bib

One potential source of performance benchmarks: https://ai.stanford.edu/~amaas/data/sentiment/ and https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Weed Species Image Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Flower Photos dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighboring flora. The selected weed species are local to pastoral grasslands across the state of Queensland. They include: “Chinee apple”, “Snake weed”, “Lantana”, “Prickly acacia”, “Siam weed”, “Parthenium”, “Rubber vine” and “Parkinsonia”.

The research team built and tested their models using a five-fold cross-validation approach. Each fold of the dataset contains the subset of data for training (60%), validation (20%), and testing (20%). The research team set up the Python script for multi-label classification. To keep our experiments straight-forward for now, this series of exercises will focus on predicting a single class for each image.

From iteration Take1, we constructed a CNN model using the ResNet50 architecture and tested the model’s performance using the dataset’s five subsets.

From iteration Take2, we constructed a CNN model using the InceptionV3 architecture and tested the model’s performance using the dataset’s five subsets.

In this Take3 iteration, we will construct a CNN model using the DenseNet201 architecture and test the model’s performance using the dataset’s five subsets.

ANALYSIS: In this Take3 iteration and using the subset0 portion of the dataset, the model’s performance achieved an accuracy score of 80.41% on the validation dataset after 50 epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 83.15%.

  • Data subset1: Validation – 80.24%, Test – 82.87%
  • Data subset2: Validation – 77.58%, Test – 81.46%
  • Data subset3: Validation – 82.22%, Test – 84.43%
  • Data subset4: Validation – 74.96%, Test – 80.56%

CONCLUSION: In this iteration, the DenseNet201 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

CONCLUSION: In this iteration, the InceptionV3 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Weed Species Image Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/AlexOlsen/DeepWeeds

One potential source of performance benchmarks: https://github.com/AlexOlsen/DeepWeeds

The HTML formatted report can be found here on GitHub.

Algorithmic Trading Model for Trend-Following with Moving Averages Crossover Strategy Using Python Take 3

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model examines a simple trend-following strategy for a stock. The model enters a position when the price reaches either the highest or the lowest points for the last X number of days. The model will exit the trade when the stock’s fast and slow moving-average lines cross each other.

In addition to the stock price, the models will also use the trading volume indicator to confirm the buy/sell signal further. Finally, the strategy will also incorporate a fixed holding window. The system will exit the position when the holding window reaches the maximum window size.

From iteration Take1, we set up the models using a trend window size for long trades only. The window size varied from 10 to 50 trading days at a 5-day increment. We used 20 to 40 days for the fast-moving average and 50 to 80 days for the slow-moving average. The models also incorporated a volume indicator with a fixed window size of 10 days to confirm the buy/sell signal. Furthermore, we did not limit the holding period by setting the maximum holding period to 999 days for this iteration.

From iteration Take2, we set up the models using a trend window size for short trades only. The window size varied from 10 to 50 trading days at a 5-day increment. We used 20 to 40 days for the fast-moving average and 50 to 80 days for the slow-moving average. The models also incorporated a volume indicator with a fixed window size of 10 days to confirm the buy/sell signal. Furthermore, we did not limit the holding period by setting the maximum holding period to 999 days for this iteration.

In this Take3 iteration, we will set up the models using a trend window size for both long and short trades. The window size will vary from 10 to 50 trading days at a 5-day increment. We will use 20 to 40 days for the fast-moving average and 50 to 80 days for the slow-moving average. The models will also incorporate a volume indicator with a fixed window size of 10 days to confirm the buy/sell signal. Furthermore, we will not limit the holding period by setting the maximum holding period to 999 days for this iteration.

ANALYSIS: From iteration Take1, we analyzed the stock prices for Apple Inc. (AAPL) between January 1, 2018, and February 19, 2021. The top trading model produced a profit of 92.77 dollars per share. The buy-and-hold approach yielded a gain of 87.70 dollars per share.

From iteration Take2, we analyzed the stock prices for Apple Inc. (AAPL) between January 1, 2018, and February 19, 2021. The top trading model produced a loss of 3.57 dollars per share. The buy-and-hold approach yielded a gain of 87.70 dollars per share.

In this Take3 iteration, we analyzed the stock prices for Apple Inc. (AAPL) between January 1, 2018, and February 19, 2021. The top trading model produced a profit of 67.40 dollars per share. The buy-and-hold approach yielded a gain of 87.70 dollars per share.

CONCLUSION: For the stock of AAPL during the modeling time frame, the long-and-short trading strategy did not produce a better return than the buy-and-hold approach. However, we should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Weed Species Image Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Flower Photos dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighboring flora. The selected weed species are local to pastoral grasslands across the state of Queensland. They include: “Chinee apple”, “Snake weed”, “Lantana”, “Prickly acacia”, “Siam weed”, “Parthenium”, “Rubber vine” and “Parkinsonia”.

The research team built and tested their models using a five-fold cross-validation approach. Each fold of the dataset contains the subset of data for training (60%), validation (20%), and testing (20%). The research team set up the Python script for multi-label classification. To keep our experiments straight-forward for now, this series of exercises will focus on predicting a single class for each image.

From iteration Take1, we constructed a CNN model using the ResNet50 architecture and tested the model’s performance using the dataset’s five subsets.

In this Take2 iteration, we will construct a CNN model using the InceptionV3 architecture and test the model’s performance using the dataset’s five subsets.

ANALYSIS: In this Take2 iteration and using the subset0 portion of the dataset, the model’s performance achieved an accuracy score of 80.18% on the validation dataset after 50 epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 84.63%.

  • Data subset1: Validation – 82.01%, Test – 86.30%
  • Data subset2: Validation – 82.18%, Test – 86.12%
  • Data subset3: Validation – 82.04%, Test – 86.54%
  • Data subset4: Validation – 80.90%, Test – 83.91%

CONCLUSION: In this iteration, the InceptionV3 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Weed Species Image Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/AlexOlsen/DeepWeeds

One potential source of performance benchmarks: https://github.com/AlexOlsen/DeepWeeds

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Weed Species Image Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Flower Photos dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighboring flora. The selected weed species are local to pastoral grasslands across the state of Queensland. They include: “Chinee apple”, “Snake weed”, “Lantana”, “Prickly acacia”, “Siam weed”, “Parthenium”, “Rubber vine” and “Parkinsonia”.

The research team built and tested their models using a five-fold cross-validation approach. Each fold of the dataset contains the subset of data for training (60%), validation (20%), and testing (20%). The research team set up the Python script for multi-label classification. To keep our experiments straight-forward for now, this series of exercises will focus on predicting a single class for each image.

In this Take1 iteration, we will construct a CNN model using the ResNet50 architecture and test the model’s performance using one of the dataset’s five subsets.

ANALYSIS: In this Take1 iteration and using the subset0 portion of the dataset, the model’s performance achieved an accuracy score of 78.61% on the validation dataset after 50 epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 82.81%.

Data subset1: Validation – 73.70%, Test – 80.73%

Data subset2: Validation – 78.04%, Test – 81.43%

Data subset3: Validation – 79.59%, Test – 84.54%

Data subset4: Validation – 79.76%, Test – 84.99%

CONCLUSION: In this iteration, the ResNet50 TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Weed Species Image Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/AlexOlsen/DeepWeeds

One potential source of performance benchmarks: https://github.com/AlexOlsen/DeepWeeds

The HTML formatted report can be found here on GitHub.