Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Chronic Kidney Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.
INTRODUCTION: The problem is to predict the chronic kidney disease from the dataset that was collected from the hospital records for two months.
ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 97.61%. Two algorithms (Logistic Regression and Extra Trees) achieved the top accuracy metrics after the first round of modeling. After tuning the hyperparameters, Logistic Regression turned in the top overall result and achieved an accuracy metric of 99.33% with the training dataset. Using the same hyperparameters, the Logistic Regression algorithm processed the test dataset with an accuracy of 100%, which was even better than the prediction from the training data.
CONCLUSION: For this iteration, the Logistic Regression algorithm achieved the best overall results using the training and test datasets. For this dataset, Extra Trees should be considered for further modeling.
Dataset Used: Chronic Kidney Disease Data Set
Dataset ML Model: Binary classification with numerical and categorical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease
One potential source of performance benchmark: https://www.kaggle.com/mansoordaku/ckdisease
The HTML formatted report can be found here on GitHub.