Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Disaster Tweets Classification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: Twitter has become an important communication channel in times of emergency. The ubiquitous nature of smartphones enables people to announce an emergency they are observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter. In this practice Kaggle competition, we want to build a machine learning model that predicts which Tweets are about real disasters and which ones are not. This dataset was created by Figure-Eight and shared initially on their ‘Data for Everyone’ website.
In this Take1 iteration, we will deploy a bag-of-words model to classify the Tweets. We will also submit the test predictions to Kaggle and obtain the performance level of the model.
ANALYSIS: In this Take1 iteration, the bag-of-words model’s performance achieved an average accuracy score of 75.49% after 20 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 75.02%.
CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Sentiment Labelled Sentences
Dataset ML Model: Binary class text classification with text-oriented features
Dataset Reference: https://www.kaggle.com/c/nlp-getting-started/
The HTML formatted report can be found here on GitHub.