Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Kannada Handwritten Digits dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: Kannada is a language spoken predominantly by the people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script. This modeling project uses the same format as the MNIST dataset regarding how the data is structured. All details of the dataset curation have been captured in the paper titled: Prabhu, Vinay Uday. “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” arXiv preprint arXiv:1908.01242 (2019)
In this Take1 iteration, we will construct a simple three-layer CNN neural network to use as the baseline model for analyzing this dataset.
ANALYSIS: In this Take1 iteration, the baseline model’s performance achieved an average accuracy score of 99.48% after ten epochs using the training dataset. After trying out different hyperparameters sets, the best model processed Kaggle’s public test dataset with an accuracy score of 97.90%. Furthermore, the final model processed Kaggle’s unseen test dataset with an accuracy measurement of 97.74%.
CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Kannada Handwritten Digits Dataset
Dataset ML Model: Multi-class image classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/Kannada-MNIST/overview
One potential source of performance benchmarks: https://www.kaggle.com/c/Kannada-MNIST/leaderboard
The HTML formatted report can be found here on GitHub.