Binary Classification Model for Bondora P2P Lending Using Python and Scikit-Learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Bondora P2P Lending dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The Kaggle dataset owner retrieved this dataset from Bondora, a leading European peer-to-peer lending platform. The data comprises demographic and financial information of the borrowers with defaulted and non-defaulted loans between February 2009 and July 2021. For investors, “peer-to-peer lending” or “P2P” offers an attractive way to diversify portfolios and enhance long-term performance. However, to make effective decisions, investors want to minimize the risk of default of each lending decision and realize the return that compensates for the risk. Therefore, we will predict the default risk by focusing on the “DefaultDate” attribute as the target.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC-AUC benchmark of 0.9539 using the training dataset. We selected Random Forest and Extra Trees to perform the tuning exercises. After a series of tuning trials, the refined Extra Trees model processed the training dataset with a final ROC-AUC score of 0.9801. When we processed the test dataset with the final model, the model achieved a ROC-AUC score of 0.9162.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Bondora P2P Lending Loan Data

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/sid321axn/bondora-peer-to-peer-lending-loan-data

Dataset Attribute Description: https://www.bondora.com/en/public-reports

One potential source of performance benchmark: https://www.kaggle.com/sid321axn/bondora-peer-to-peer-lending-loan-data/code

The HTML formatted report can be found here on GitHub.

Data Validation for Bondora P2P Lending Using Python and TensorFlow Data Validation

SUMMARY: The project aims to construct a data validation flow using TensorFlow Data Validation (TFDV) and document the end-to-end steps using a template. The Bondora P2P Lending dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The Kaggle dataset owner retrieved this dataset from Bondora, a leading European peer-to-peer lending platform. The data comprises demographic and financial information of the borrowers with defaulted and non-defaulted loans between February 2009 and July 2021. For investors, “peer-to-peer lending” or “P2P” offers an attractive way to diversify portfolios and enhance long-term performance. However, to make effective decisions, investors want to minimize the risk of default of each lending decision and realize the return that compensates for the risk. Therefore, we will predict the default risk by focusing on the “DefaultDate” attribute as the target.

Additional Notes: I adapted this workflow from the TensorFlow Data Validation tutorial on TensorFlow.org (https://www.tensorflow.org/tfx/tutorials/data_validation/tfdv_basic). The plan is to build a robust TFDV script for validating datasets in building machine learning models.

CONCLUSION: In this iteration, the data validation workflow helped to validate the features and structures of the training, validation, and test datasets. The workflow also generated statistics over different slices of data which can help track model and anomaly metrics.

Dataset Used: Kaggle Bondora P2P Lending Loan Data

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/sid321axn/bondora-peer-to-peer-lending-loan-data

Dataset Attribute Description: https://www.bondora.com/en/public-reports

The HTML formatted report can be found here on GitHub.

Jeff Goins on Real Artists Don’t Starve, Part 3

In his book, Real Artists Don’t Starve: Timeless Strategies for Thriving in the New Creative Age, Jeff Goins discusses how we can apply prudent strategies in positioning ourselves for thriving in our chosen field of craft.

These are some of my favorite concepts and takeaways from reading the book.

Chapter 3, Apprentice Under a Master

In this chapter, Jeff discusses the importance of seeking out apprentice opportunities on our journey to master the craft. He offers the following recommendations for us to think about:

  • Becoming a master means we will master our craft. However, before we become masters, we must first become apprentices.
  • An apprentice makes a conscious choice to do whatever it takes to master the craft. The marks of a good apprentice are patience, perseverance, and humility.
  • An apprentice does not give up, and they also do what no one else is willing to do to acquire mastery. Therefore, we must be diligent enough to take the work seriously and continue to grow.
  • The first step in an apprenticeship is to find a master who is worth studying. When we find such a person, our goal is to consume as much of their work as possible and familiarize ourselves with it.
  • When the apprenticeship starts, we consistently do the hard work and keep showing up, regardless of the outcome. But, in the end, hard work is all we can measure.
  • As thriving artists, we are both humble enough to admit our need for help and sufficiently audacious to see it out. Great work is a result of a willingness to become an apprentice on our journey to mastery.

In summary, “The Starving Artist believes he has enough talent. The Thriving Artist apprentices under a master.”

“出錯”的計劃

(從我一個尊敬的作家,賽斯·高汀

永無過失是前進發展的一個困難模型。

你很可能會犯錯誤,你會根據你不知道或也許該知道的事情來做出個選擇。然後事情會錯上加錯。

然後怎麽辦?

當一個孩子參加開車教育課時,我們是否應該教他們如果拿到罰單或出車禍後該怎麼辦?

如果您是位法庭檢察官,您的員工可能會逮捕一個無辜的人。 如果你是醫生,病人可能會死在你的醫療上。 如果您是寫博客的作者,您可能會發布一些不正確的內容。 那時候不是才開始製定計劃的時間。

你準備好並急切地說:“既然我知道目前我能知道的事,我要改變我的方向嗎?”

你是否願意說:“當時我不知道那個關鍵事實,但我應該知道,而且我正在構建系統以確保我下次能知道它?”

加倍的錯誤總是會讓事情變得更糟。

Multi-Class Image Classification Deep Learning Model for Kaggle UT Zappos50K Shoe Dataset Using TensorFlow Take 5

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kaggle UT Zappos50K Shoe dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The dataset divided the photos into four major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The research team created this dataset in the context of an online shopping task, where users pay special attention to fine-grained visual differences.

In this Take1 iteration, we will construct a CNN model based on the InceptionV3 architecture to predict the shoe category based on the available images.

ANALYSIS: In this Take1 iteration, the InceptionV3 model’s performance achieved an accuracy score of 98.34% after ten epochs using the training dataset. The final model processed the validation dataset with an accuracy measurement of 87.28%.

CONCLUSION: In this iteration, the InceptionV3-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kaggle UT Zappos50K Shoe Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/grassknoted/asl-alphabet

One potential source of performance benchmarks: https://www.kaggle.com/grassknoted/asl-alphabet/code

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Kaggle UT Zappos50K Shoe Dataset Using TensorFlow Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kaggle UT Zappos50K Shoe dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The dataset divided the photos into four major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The research team created this dataset in the context of an online shopping task, where users pay special attention to fine-grained visual differences.

In this Take1 iteration, we will construct a CNN model based on the MobileNetV3Large architecture to predict the shoe category based on the available images.

ANALYSIS: In this Take1 iteration, the MobileNetV3Large model’s performance achieved an accuracy score of 99.20% after ten epochs using the training dataset. The final model processed the validation dataset with an accuracy measurement of 86.09%.

CONCLUSION: In this iteration, the MobileNetV3Large-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kaggle UT Zappos50K Shoe Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/grassknoted/asl-alphabet

One potential source of performance benchmarks: https://www.kaggle.com/grassknoted/asl-alphabet/code

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Kaggle UT Zappos50K Shoe Dataset Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kaggle UT Zappos50K Shoe dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The dataset divided the photos into four major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The research team created this dataset in the context of an online shopping task, where users pay special attention to fine-grained visual differences.

In this Take1 iteration, we will construct a CNN model based on the EfficientNetB7 architecture to predict the shoe category based on the available images.

ANALYSIS: In this Take1 iteration, the EfficientNetB7 model’s performance achieved an accuracy score of 99.32% after ten epochs using the training dataset. The final model processed the validation dataset with an accuracy measurement of 83.39%.

CONCLUSION: In this iteration, the EfficientNetB7-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kaggle UT Zappos50K Shoe Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/grassknoted/asl-alphabet

One potential source of performance benchmarks: https://www.kaggle.com/grassknoted/asl-alphabet/code

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Model for Kaggle UT Zappos50K Shoe Dataset Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kaggle UT Zappos50K Shoe dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The dataset divided the photos into four major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The research team created this dataset in the context of an online shopping task, where users pay special attention to fine-grained visual differences.

In this Take1 iteration, we will construct a CNN model based on the DenseNet201 architecture to predict the shoe category based on the available images.

ANALYSIS: In this Take1 iteration, the DenseNet201 model’s performance achieved an accuracy score of 97.66% after ten epochs using the training dataset. The final model processed the validation dataset with an accuracy measurement of 84.96%.

CONCLUSION: In this iteration, the DenseNet201-based CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Kaggle UT Zappos50K Shoe Dataset

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://www.kaggle.com/grassknoted/asl-alphabet

One potential source of performance benchmarks: https://www.kaggle.com/grassknoted/asl-alphabet/code

The HTML formatted report can be found here on GitHub.