Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Heart Disease Key Indicators dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.
INTRODUCTION: This dataset comes from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) study, which conducts annual telephone surveys to gather data on the health status of U.S. residents. The original dataset consists of 401,958 rows and 279 columns. However, the Kaggle project owner selected some of the most relevant attributes from the dataset and cleaned it up for machine learning projects.
ANALYSIS: The performance of the preliminary TensorFlow model achieved a ROC/AUC benchmark of 84.16%. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 50.60%.
CONCLUSION: In this iteration, the TensorFlow model did not appear to be a suitable algorithm for modeling this dataset.
Dataset Used: Personal Key Indicators of Heart Disease Dataset
Dataset ML Model: Binary classification with numerical and categorical features
Dataset Reference: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease
One source of potential performance benchmarks: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease/code
The HTML formatted report can be found here on GitHub.