Binary Classification Model for Coronary Artery Disease Using Python Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Z-Alizadeh Sani CAD dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The researchers collected the data file for coronary artery disease (CAD) diagnosis. Each patient could be in two possible categories CAD or Normal. A patient is categorized as CAD, if his/her diameter narrowing is greater than or equal to 50%, and otherwise as Normal. The Z-Alizadeh Sani dataset contains the records of 303 patients, each of which has 59 features. The features can belong to one of four groups: demographic, symptom and examination, ECG, and laboratory and echo features. In this extension, the researchers add three features for the LAD, LCX, and RCA arteries. CAD becomes true when at least one of these three arteries is stenotic. To properly use this dataset for CAD classification only one of LAD, LCX, RCA or Cath (Result of angiography) can be present in the dataset. This dataset not only can be used for CAD detection, but also stenosis diagnosis of each LAD, LCX and RCA arteries.

In iteration Take1, we established the baseline prediction accuracy for further takes of modeling.

In this iteration, we will examine the prediction accuracy of the models for the three arteries (LAD, LCX, and RCA). We hope to gain further insights on whether the data and models can be used to predict the overall result of angiography by examining the major arteries individually.

CONCLUSION: For this iteration, predicting the result of angiography appears to work the best by using the LAD artery. The model using the LAD readings produced an accuracy rate of 83.51% on the test dataset. The model using the LCX readings produced an accuracy rate of 71.42% on the test dataset. The model using the RCA readings produced an accuracy rate of 65.93% on the test dataset. For this dataset, using the LAD artery data and the Random Forest algorithm should be considered for further modeling.

Dataset Used: Z-Alizadeh Sani Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference:

The HTML formatted report can be found here on GitHub.