Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Bondora P2P Lending dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: The Kaggle dataset owner retrieved this dataset from Bondora, a leading European peer-to-peer lending platform. The data comprises demographic and financial information of the borrowers with defaulted and non-defaulted loans between February 2009 and July 2021. For investors, “peer-to-peer lending” or “P2P” offers an attractive way to diversify portfolios and enhance long-term performance. However, to make effective decisions, investors want to minimize the risk of default of each lending decision and realize the return that compensates for the risk. Therefore, we will predict the default risk by focusing on the “DefaultDate” attribute as the target.
ANALYSIS: The performance of the preliminary XGBoost model achieved a ROC-AUC benchmark of 0.9712. After a series of tuning trials, the refined XGBoost model processed the training dataset with a final ROC-AUC score of 0.9849. When we applied the last model to Kaggle’s test dataset, the model achieved a ROC-AUC score of 0.9307.
CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.
Dataset Used: Kaggle Bondora P2P Lending Loan Data
Dataset ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: https://www.kaggle.com/sid321axn/bondora-peer-to-peer-lending-loan-data
Dataset Attribute Description: https://www.bondora.com/en/public-reports
One potential source of performance benchmark: https://www.kaggle.com/sid321axn/bondora-peer-to-peer-lending-loan-data/code
The HTML formatted report can be found here on GitHub.