Binary Classification Model for Coronary Artery Disease Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Z-Alizadeh Sani CAD dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The researchers collected the data file for coronary artery disease (CAD) diagnosis. Each patient could be in two possible categories CAD or Normal. A patient is categorized as CAD, if his/her diameter narrowing is greater than or equal to 50%, and otherwise as Normal. The Z-Alizadeh Sani dataset contains the records of 303 patients, each of which has 59 features. The features can belong to one of four groups: demographic, symptom and examination, ECG, and laboratory and echo features. In this extension, the researchers add three features for the LAD, LCX, and RCA arteries. CAD becomes true when at least one of these three arteries is stenotic. To properly use this dataset for CAD classification only one of LAD, LCX, RCA or Cath (Result of angiography) can be present in the dataset. This dataset not only can be used for CAD detection, but also stenosis diagnosis of each LAD, LCX and RCA arteries.

In this iteration, we plan to establish the baseline prediction accuracy for further takes of modeling.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 79.88%. Two algorithms (Extra Trees and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 84.43%. By using the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 89.01%, which was even better than the prediction accuracy gained from the training data.

CONCLUSION: For this iteration, the Extra Trees algorithm achieved the best overall training and validation results. For this dataset, the Extra Trees algorithm could be considered for further modeling.

Dataset Used: Z-Alizadeh Sani Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference:

The HTML formatted report can be found here on GitHub.