Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Heart Disease Key Indicators dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.
INTRODUCTION: This dataset comes from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) study, which conducts annual telephone surveys to gather data on the health status of U.S. residents. The original dataset consists of 401,958 rows and 279 columns. However, the Kaggle project owner selected some of the most relevant attributes from the dataset and cleaned it up for machine learning projects.
ANALYSIS: The performance of the preliminary Gradient Boosted Trees model achieved a ROC-AUC benchmark of 88.47% on the training dataset. When we applied the finalized model to the test dataset, the model achieved a ROC-AUC score of 83.17%.
CONCLUSION: In this iteration, the TensorFlow Decision Forests model appeared to be a suitable algorithm for modeling this dataset.
Dataset Used: Personal Key Indicators of Heart Disease Dataset
Dataset ML Model: Binary classification with numerical and categorical features
Dataset Reference: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease
One source of potential performance benchmarks: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease/code
The HTML formatted report can be found here on GitHub.