Binary Classification Model for Parkinson’s Disease Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. Parkinson’s Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel was collected from each subject with three repetitions.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 77.84%. Two algorithms (Random Forest and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 88.24%. By using the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 83.63%, which was just slightly below the prediction accuracy using the training data.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall results using the training and testing datasets. For this dataset, Random Forest should be considered for further modeling or production use.

Dataset Used: Parkinson’s Disease Classification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E. and Apaydin, H., 2018. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, DOI: https://doi.org/10.1016/j.asoc.2018.10.022

The HTML formatted report can be found here on GitHub.