Algorithmic Trading Model for Simple Mean-Reversion Strategy Using Python Take 2

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model examines a simple mean-reversion strategy for a stock. The model enters a position when the price reaches either the highest or lowest points for the last X number of days. The model will exit the trade when the stock price crosses the mean of the same window size.

In iteration Take1, we set up the models using a trend window size for long trades only. The window size will vary from 10 to 50 trading days at a 5-day increment.

In this Take2 iteration, we will set up the models using a trend window size for long and short trades. The window size will vary from 10 to 50 trading days at a 5-day increment.

ANALYSIS: In iteration Take1, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 1, 2021. The top trading model produced a profit of 133.80 dollars per share. The buy-and-hold approach yielded a gain of 192.73 dollars per share.

In this Take2 iteration, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 1, 2021. The top trading model produced a profit of 113.21 dollars per share. The buy-and-hold approach yielded a gain of 192.73 dollars per share.

CONCLUSION: For the stock of COST during the modeling time frame, the simple long-and-short trading strategy did not produce a better return than the buy-and-hold approach. We should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Company Bankruptcy Prediction Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.

The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.

In iteration Take1, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

In iteration Take2, we constructed and tuned a XGBoost model. We also will observe the best results that we can obtain from the model.

This Take3 iteration will construct and tune a three-layer TensorFlow model. We also will observe the best results that we can obtain from the model.

ANALYSIS: In iteration Take1, the machine learning algorithms’ average performance achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.

In iteration Take2, the XGBoost algorithm achieved an F1 score of 96.48% using the training dataset. After a series of tuning trials, the XGBoost model turned in an F1 score of 98.38%. When we applied the XGBoost model to the previously unseen test dataset, we obtained an F1 score of 58.18%.

In this Take3 iteration, The performance of the TensorFlow model achieved an average F1 score of 67.03% after 20 epochs using the training dataset. When we applied the XGBoost model to the previously unseen test dataset, obtained an F1 score of 41.55%.

CONCLUSION: In this iteration, the TensorFlow model did not appear to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Company Bankruptcy Prediction Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

The HTML formatted report can be found here on GitHub.

Algorithmic Trading Model for Simple Mean-Reversion Strategy Using Python Take 1

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This algorithmic trading model examines a simple mean-reversion strategy for a stock. The model enters a position when the price reaches either the highest or lowest points for the last X number of days. The model will exit the trade when the stock price crosses the mean of the same window size.

In this Take1 iteration, we will set up the models using a trend window size for long trades only. The window size will vary from 10 to 50 trading days at a 5-day increment.

ANALYSIS: In this Take1 iteration, we analyzed the stock prices for Costco Wholesale (COST) between January 1, 2016, and April 1, 2021. The top trading model produced a profit of 133.80 dollars per share. The buy-and-hold approach yielded a gain of 192.73 dollars per share.

CONCLUSION: For the stock of COST during the modeling time frame, the simple long-only trading strategy did not produce a better return than the buy-and-hold approach. We should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Company Bankruptcy Prediction Using XGBoost Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.

The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.

In iteration Take1, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.

This Take2 iteration will construct and tune an XGBoost model. We also will observe the best results that we can obtain from the models.

ANALYSIS: In iteration Take1, the machine learning algorithms’ average performance achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.

In this Take2 iteration, the XGBoost algorithm achieved an F1 score of 96.48% using the training dataset. After a series of tuning trials, the XGBoost model turned in an F1 score of 98.38%. When we applied the XGBoost model to the previously unseen test dataset, we obtained an F1 score of 58.18%.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Company Bankruptcy Prediction Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Company Bankruptcy Prediction Using Scikit-Learn Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.

The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.

This Take1 iteration will construct and tune several classic machine learning models using the Scikit-Learn library. We also will observe the best results that we can obtain from the models.

ANALYSIS: The average performance of the machine learning algorithms achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.

CONCLUSION: In this iteration, the Extra Trees model appeared to be a suitable algorithm for modeling this dataset. We should consider using the algorithm for further modeling.

Dataset Used: Company Bankruptcy Prediction Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction

One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Kannada Handwritten Digits Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kannada Handwritten Digits dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kannada is a language spoken predominantly by the people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script. This modeling project uses the same format as the MNIST dataset regarding how the data is structured. All details of the dataset curation have been captured in the paper titled: Prabhu, Vinay Uday. “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” arXiv preprint arXiv:1908.01242 (2019)

From iteration Take1, we constructed a simple three-layer CNN neural network to use as the baseline model for analyzing this dataset.

From iteration Take2, we constructed a four-layer CNN neural network to use as the baseline model for analyzing this dataset.

In this Take3 iteration, we will construct a five-layer CNN neural network to use as the baseline model for analyzing this dataset.

ANALYSIS: From iteration Take1, the baseline model’s performance achieved an average accuracy score of 99.48% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 97.90%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.74%.

From iteration Take2, the four-layer model’s performance achieved an average accuracy score of 99.49% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 98.24%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.80%.

In this Take3 iteration, the five-layer model’s performance achieved an average accuracy score of 99.52% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 98.08%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.74%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kannada Handwritten Digits Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/Kannada-MNIST/overview

One potential source of performance benchmarks: https://www.kaggle.com/c/Kannada-MNIST/leaderboard

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Kannada Handwritten Digits Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kannada Handwritten Digits dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kannada is a language spoken predominantly by the people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script. This modeling project uses the same format as the MNIST dataset regarding how the data is structured. All details of the dataset curation have been captured in the paper titled: Prabhu, Vinay Uday. “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” arXiv preprint arXiv:1908.01242 (2019)

From iteration Take1, we constructed a simple three-layer CNN neural network to use as the baseline model for analyzing this dataset.

In this Take2 iteration, we will construct a four-layer CNN neural network to use as the baseline model for analyzing this dataset.

ANALYSIS: From iteration Take1, the baseline model’s performance achieved an average accuracy score of 99.48% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 97.90%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.74%.

In this Take2 iteration, the four-layer model’s performance achieved an average accuracy score of 99.49% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 98.24%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.80%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kannada Handwritten Digits Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/Kannada-MNIST/overview

One potential source of performance benchmarks: https://www.kaggle.com/c/Kannada-MNIST/leaderboard

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Kannada Handwritten Digits Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kannada Handwritten Digits dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kannada is a language spoken predominantly by the people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script. This modeling project uses the same format as the MNIST dataset regarding how the data is structured. All details of the dataset curation have been captured in the paper titled: Prabhu, Vinay Uday. “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” arXiv preprint arXiv:1908.01242 (2019)

In this Take1 iteration, we will construct a simple three-layer CNN neural network to use as the baseline model for analyzing this dataset.

ANALYSIS: In this Take1 iteration, the baseline model’s performance achieved an average accuracy score of 99.48% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 97.90%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.74%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kannada Handwritten Digits Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/Kannada-MNIST/overview

One potential source of performance benchmarks: https://www.kaggle.com/c/Kannada-MNIST/leaderboard

The HTML formatted report can be found here on GitHub.