Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Company Bankruptcy Prediction dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: The research team collected the data from the Taiwan Economic Journal from 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. Because not catching companies in a shaky financial situation is a costly business proposition, we will maximize the precision and recall ratios with the F1 score.
The data analysis first appeared on the research paper, Liang, D., Lu, C.-C., Tsai, C.-F., and Shih, G.-A. (2016) Financial Ratios and Corporate Governance Indicators in Bankruptcy Prediction: A Comprehensive Study. European Journal of Operational Research, vol. 252, no. 2, pp. 561-572.
In iteration Take1, we constructed and tuned several classic machine learning models using the Scikit-Learn library. We also observed the best results that we could obtain from the models.
In iteration Take2, we constructed and tuned a XGBoost model. We also will observe the best results that we can obtain from the model.
This Take3 iteration will construct and tune a three-layer TensorFlow model. We also will observe the best results that we can obtain from the model.
ANALYSIS: In iteration Take1, the machine learning algorithms’ average performance achieved an F1 score of 94.37%. Two algorithms (Extra Trees and Random Forest) produced the top F1 metrics after the first round of modeling. After a series of tuning trials, the Extra Trees model turned in an F1 score of 97.39% using the training dataset. When we applied the Extra Tree model to the previously unseen test dataset, we obtained an F1 score of 55.55%.
In iteration Take2, the XGBoost algorithm achieved an F1 score of 96.48% using the training dataset. After a series of tuning trials, the XGBoost model turned in an F1 score of 98.38%. When we applied the XGBoost model to the previously unseen test dataset, we obtained an F1 score of 58.18%.
In this Take3 iteration, The performance of the TensorFlow model achieved an average F1 score of 67.03% after 20 epochs using the training dataset. When we applied the XGBoost model to the previously unseen test dataset, obtained an F1 score of 41.55%.
CONCLUSION: In this iteration, the TensorFlow model did not appear to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Company Bankruptcy Prediction Data Set
Dataset ML Model: Binary classification with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
One potential source of performance benchmark: https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction
The HTML formatted report can be found here on GitHub.