Multi-Class Image Classification Model for Land Use and Land Cover with Sentinel-2 Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Land Use and Land Cover with Sentinel-2 dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset is part of a research study that addresses the challenge of land use and land cover classification using Sentinel-2 satellite images. The research project presented a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes within a total of 27,000 labeled and geo-referenced images. The study project also provided benchmarks for this novel dataset with its spectral bands using deep Convolutional Neural Network (CNNs).

In iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a validation dataset (20%) set aside from the training images.

In iteration Take2, we constructed a CNN model using the DenseNet architecture and tested the model’s performance using a validation dataset (20%) set aside from the training images.

In this Take3 iteration, we will construct a CNN model using the EfficientNet architecture and test the model’s performance using a validation dataset (20%) set aside from the training images.

ANALYSIS: In iteration Take1, the baseline model’s performance achieved an accuracy score of 98.83% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 86.06% on the validation dataset.

In iteration Take2, the DenseNet model’s performance achieved an accuracy score of 99.92% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 97.59% on the validation dataset.

In this Take3 iteration, the EfficientNet model’s performance achieved an accuracy score of 99.89% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 96.83% on the validation dataset.

CONCLUSION: In this iteration, the EfficientNet CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with more CNN architectures for further modeling.

Dataset Used: Land Use and Land Cover with Sentinel-2

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/phelber/eurosat

A potential source of performance benchmarks: https://github.com/phelber/eurosat

The HTML formatted report can be found here on GitHub.

Algorithmic Trading Model for Mean-Reversion with Bollinger Bands Strategy Using Python Take 1

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model examines a simple mean-reversion strategy for a stock. The model enters a position when the price reaches either the upper or lower Bollinger Bands for the last X number of days. The model will exit the trade when the stock price crosses the middle Bollinger Band of the same window size.

In this Take1 iteration, we will set up the models using a trend window size for long trades only. The window size will vary from 10 to 50 trading days at a 5-day increment.

ANALYSIS: In this Take1 iteration, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 9, 2021. The top trading model produced a profit of 105.59 dollars per share. The buy-and-hold approach yielded a gain of 201.10 dollars per share.

CONCLUSION: For the stock of COST during the modeling time frame, the simple long-only trading strategy did not produce a better return than the buy-and-hold approach. We should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Land Use and Land Cover with Sentinel-2 Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Land Use and Land Cover with Sentinel-2 dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset is part of a research study that addresses the challenge of land use and land cover classification using Sentinel-2 satellite images. The research project presented a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes within a total of 27,000 labeled and geo-referenced images. The study project also provided benchmarks for this novel dataset with its spectral bands using deep Convolutional Neural Network (CNNs).

In iteration Take1, we constructed a CNN model using a simple three-block VGG architecture and tested the model’s performance using a validation dataset (20%) set aside from the training images.

In this Take2 iteration, we will construct a CNN model using the DenseNet121 architecture and test the model’s performance using a validation dataset (20%) set aside from the training images.

ANALYSIS: In iteration Take1, the baseline model’s performance achieved an accuracy score of 98.83% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 86.06% on the validation dataset.

In this Take2 iteration, the DenseNet121 model’s performance achieved an accuracy score of 99.92% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 97.59% on the validation dataset.

CONCLUSION: In this iteration, the DenseNet121 CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with more CNN architectures for further modeling.

Dataset Used: Land Use and Land Cover with Sentinel-2

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/phelber/eurosat

A potential source of performance benchmarks: https://github.com/phelber/eurosat

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Land Use and Land Cover with Sentinel-2 Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Land Use and Land Cover with Sentinel-2 dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset is part of a research study that addresses the challenge of land use and land cover classification using Sentinel-2 satellite images. The research project presented a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes within a total of 27,000 labeled and geo-referenced images. The study project also provided benchmarks for this novel dataset with its spectral bands using deep Convolutional Neural Network (CNNs).

In this Take1 iteration, we will construct a CNN model using a simple three-block VGG architecture and test the model’s performance using a validation dataset (20%) set aside from the training images.

ANALYSIS: In this Take1 iteration, the baseline model’s performance achieved an accuracy score of 98.83% on the training dataset after 20 epochs. Furthermore, the final model achieved an accuracy score of 86.06% on the validation dataset.

CONCLUSION: In this iteration, the simple three-block VGG CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with more CNN architectures for further modeling.

Dataset Used: Land Use and Land Cover with Sentinel-2

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://github.com/phelber/eurosat

A potential source of performance benchmarks: https://github.com/phelber/eurosat

The HTML formatted report can be found here on GitHub.

Seth Godin’s Akimbo: Helmet Insights

In his Akimbo podcast, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

In this podcast, Seth discusses his observations on helmet-wearing in sports and the behavior’s implications on shaping our culture.

We play sports for many reasons, but one reason that frequently jumps out is that we use a sport to make a statement. We often use a sport to highlight who we are and how we want to be seen by others.

While self-image is a big reason for playing a sport, we also want our peers to accept that self-image in the same sport. The sport of hockey initially rejected the notion of helmet-wearing because it did not fit with the brave-and-tough image portraited by the sport. Even with a league mandate, helmet-wearing behavior did not become the norm for a long time.

Life is also a team sport, and each of us is on a team. For an area of life we choose to play a part, we care about how we come across and from whom we want acceptance. Social pressure has a lot to do with how we make choices. Those choices we make often lead to culture-building in some fashion.

While our individual choices and behaviors can impact our culture, some government mandates can also significantly impact the overall societal culture.

Some government mandates go a long way towards normalizing certain behaviors. Those behaviors probably will not become a norm if we leave it to the individual adult citizen to make their own choices. The seatbelt, drunk-driving, and texting-while-driving are a few examples of normalized behaviors for the greater good.

Why would we need such normalized behavior mandates from the government? It turns out that, while people want to fit in, we are also lazy. Many of those normalized behaviors can lead to a greater good and do not come easily from individual efforts.

But once the pattern is in place, compliance can often go up in many folds. When we establish a standard, and when the easiest path is to follow the norm, more people will follow the standard. The normalized, rational behaviors are critically important, and they rarely come from the grassroots effort of individuals.

The insight from helmet-wearing is that we can change the system if we can find ways to influence the culture. While each of us has free will, we still like to conform to and be accepted by the group we affiliate with. When we can instill the presumption of people like us do things like this, that is the opportunity we have as we try to change the culture to normalize behaviors that will benefit us.

“嗯,對我來說似乎很棒”

(從我一個尊敬的作家,賽斯·高汀

這當然是可以,您做出來了。

如果您將它運送到全世界(甚至展示給同事),可能是因為您喜歡它。因為是你自己做出的。

但是,如果您的音樂,圖形設計,網站(無論您從事什麼工作)都無法在市場上引起共鳴,那可能是因為您忘了為他們來做。

為他人來想才是設計的核心。

這個群體中的人認為看起來很棒嗎? 他們需要什麼?

去做到那一點。

Algorithmic Trading Model for Simple Mean-Reversion Strategy Using Python Take 2

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model examines a simple mean-reversion strategy for a stock. The model enters a position when the price reaches either the highest or lowest points for the last X number of days. The model will exit the trade when the stock price crosses the mean of the same window size.

In iteration Take1, we set up the models using a trend window size for long trades only. The window size will vary from 10 to 50 trading days at a 5-day increment.

In this Take2 iteration, we will set up the models using a trend window size for long and short trades. The window size will vary from 10 to 50 trading days at a 5-day increment.

ANALYSIS: In iteration Take1, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 1, 2021. The top trading model produced a profit of 133.80 dollars per share. The buy-and-hold approach yielded a gain of 192.73 dollars per share.

In this Take2 iteration, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 1, 2021. The top trading model produced a profit of 113.21 dollars per share. The buy-and-hold approach yielded a gain of 192.73 dollars per share.

CONCLUSION: For the stock of COST during the modeling time frame, the simple long-and-short trading strategy did not produce a better return than the buy-and-hold approach. We should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Company Bankruptcy Prediction Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.

The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.

In iteration Take1, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

In iteration Take2, we constructed and tuned a XGBoost model. We also will observe the best results that we can obtain from the model.

This Take3 iteration will construct and tune a three-layer TensorFlow model. We also will observe the best results that we can obtain from the model.

ANALYSIS: In iteration Take1, the machine learning algorithms’ average performance achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.

In iteration Take2, the XGBoost algorithm achieved an F1 score of 96.48% using the training dataset. After a series of tuning trials, the XGBoost model turned in an F1 score of 98.38%. When we applied the XGBoost model to the previously unseen test dataset, we obtained an F1 score of 58.18%.

In this Take3 iteration, The performance of the TensorFlow model achieved an average F1 score of 67.03% after 20 epochs using the training dataset. When we applied the XGBoost model to the previously unseen test dataset, obtained an F1 score of 41.55%.

CONCLUSION: In this iteration, the TensorFlow model did not appear to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Company Bankruptcy Prediction Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

The HTML formatted report can be found here on GitHub.