NLP Model for Disaster Tweets Classification Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Disaster Tweets Classification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Twitter has become an important communication channel in times of emergency. The ubiquitous nature of smartphones enables people to announce an emergency they are observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter. In this practice Kaggle competition, we want to build a machine learning model that predicts which Tweets are about real disasters and which ones are not. This dataset was created by Figure-Eight and shared initially on their ‘Data for Everyone’ website.

From iteration Take1, we deployed a bag-of-words model to classify the Tweets. We also made predictions on Kaggle’s test dataset and submitted the results for evaluation.

In this Take2 iteration, we will deploy a word-embedding model to classify the Tweets. We will also submit the test predictions to Kaggle and obtain the performance score for the model.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 75.49% after 20 epochs with five iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 75.02%.

In this Take2 iteration, the word-embedding model’s performance achieved an average accuracy score of 72.45% after 20 epochs with five iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 74.65%.

CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did not do as well as the bag-of-words model. However, we should continue to experiment with both natural language processing techniques for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://www.kaggle.com/c/nlp-getting-started/

The HTML formatted report can be found here on GitHub.

NLP Model for Disaster Tweets Classification Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Disaster Tweets Classification dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Twitter has become an important communication channel in times of emergency. The ubiquitous nature of smartphones enables people to announce an emergency they are observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter. In this practice Kaggle competition, we want to build a machine learning model that predicts which Tweets are about real disasters and which ones are not. This dataset was created by Figure-Eight and shared initially on their ‘Data for Everyone’ website.

In this Take1 iteration, we will deploy a bag-of-words model to classify the Tweets. We will also submit the test predictions to Kaggle and obtain the performance level of the model.

ANALYSIS: In this Take1 iteration, the bag-of-words model’s performance achieved an average accuracy score of 75.49% after 20 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 75.02%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://www.kaggle.com/c/nlp-getting-started/

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 8

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take3, we deployed a bag-of-words model to classify the IMDB dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take4, we deployed a word-embedding model to classify the IMDB dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take5, we deployed a bag-of-words model to classify the Yelp dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take6, we deployed a word-embedding model to classify the Yelp dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take7, we deployed a bag-of-words model to classify the combined dataset from all three sets of review comments. We also compared the result with the bag-of-word model from the previous iteration.

In this Take8 iteration, we will deploy a word-embedding model to classify the combined dataset from all three sets of review comments. We will also compare the result with the models from previous iterations.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

From iteration Take3, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

From iteration Take4, the word-embedding model’s performance achieved an average accuracy score of 72.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 66.00%.

From iteration Take5, the bag-of-words model’s performance achieved an average accuracy score of 75.19% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.00%.

From iteration Take6, the word-embedding model’s performance achieved an average accuracy score of 73.44% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.50%.

From iteration Take7, the bag-of-words model’s performance achieved an average accuracy score of 76.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 78.72%.

In this Take8 iteration, the word-embedding model’s performance achieved an average accuracy score of 60.89% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 75.63%.

CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did as well as the bag-of-words model. Furthermore, we should continue to experiment with both natural language processing techniques for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 7

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take3, we deployed a bag-of-words model to classify the IMDB dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take4, we deployed a word-embedding model to classify the IMDB dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take5, we deployed a bag-of-words model to classify the Yelp dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take6, we deployed a word-embedding model to classify the Yelp dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

In this Take7 iteration, we will deploy a bag-of-words model to classify the combined dataset from all three sets of review comments. We will also compare the result with the bag-of-word model from the previous iteration.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

From iteration Take3, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

From iteration Take4, the word-embedding model’s performance achieved an average accuracy score of 72.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 66.00%.

From iteration Take5, the bag-of-words model’s performance achieved an average accuracy score of 75.19% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.00%.

From iteration Take6, the word-embedding model’s performance achieved an average accuracy score of 73.44% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.50%.

In this Take7 iteration, the bag-of-words model’s performance achieved an average accuracy score of 76.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 78.72%.

CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did as well as the bag-of-words model. Furthermore, we should continue to experiment with both natural language processing techniques for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 6

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take3, we deployed a bag-of-words model to classify the IMDB dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take4, we deployed a word-embedding model to classify the IMDB dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take5, we deployed a bag-of-words model to classify the Yelp dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

In this Take6 iteration, we will deploy a word-embedding model to classify the Yelp dataset’s review comments. We will also compare the result with the bag-of-word model from the previous iteration.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

From iteration Take3, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

From iteration Take4, the word-embedding model’s performance achieved an average accuracy score of 72.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 66.00%.

From iteration Take5, the bag-of-words model’s performance achieved an average accuracy score of 75.19% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.00%.

In this Take6 iteration, the word-embedding model’s performance achieved an average accuracy score of 73.44% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.50%.

CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did as well as the bag-of-words model. Furthermore, we should continue to experiment with both natural language processing techniques for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 5

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take3, we deployed a bag-of-words model to classify the IMDB dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take4, we deployed a word-embedding model to classify the IMDB dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

In this Take5 iteration, we will deploy a bag-of-words model to classify the Yelp dataset’s review comments. We will also apply various sequence-to-matrix modes to evaluate the model’s performance.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

From iteration Take3, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

From iteration Take4, the word-embedding model’s performance achieved an average accuracy score of 72.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 66.00%.

In this Take5 iteration, the bag-of-words model’s performance achieved an average accuracy score of 75.19% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 72.00%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

From iteration Take3, we deployed a bag-of-words model to classify the IMDB dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

In this Take4 iteration, we will deploy a word-embedding model to classify the IMDB dataset’s review comments. We will also compare the result with the bag-of-word model from the previous iteration.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

From iteration Take3, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

In this Take4 iteration, the word-embedding model’s performance achieved an average accuracy score of 72.84% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 66.00%.

CONCLUSION: In this modeling iteration, the word-embedding TensorFlow model did not do as well as the bag-of-words model. However, we should continue to experiment with both natural language processing techniques for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.

NLP Model for Sentiment Labelled Sentences Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Sentiment Labelled Sentences dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: This dataset was created for the research paper ‘From Group to Individual Labels using Deep Features,’ Kotzias et al., KDD 2015. The paper researchers randomly selected 500 positive and 500 negative sentences from a larger dataset of reviews for each website. The researcher also attempted to choose sentences with a positive or negative connotation as the goal was to avoid selecting neutral sentences.

From iteration Take1, we deployed a bag-of-words model to classify the Amazon dataset’s review comments. We also applied various sequence-to-matrix modes to evaluate the model’s performance.

From iteration Take2, we deployed a word-embedding model to classify the Amazon dataset’s review comments. We also compared the result with the bag-of-word model from the previous iteration.

In this Take3 iteration, we will deploy a bag-of-words model to classify the IMDB dataset’s review comments. We will also apply various sequence-to-matrix modes to evaluate the model’s performance.

ANALYSIS: From iteration Take1, the bag-of-words model’s performance achieved an average accuracy score of 77.31% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 71.00%.

From iteration Take2, the word-embedding model’s performance achieved an average accuracy score of 73.25% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 67.00%.

In this Take3 iteration, the bag-of-words model’s performance achieved an average accuracy score of 77.26% after 25 epochs with ten iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 68.66%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Sentiment Labelled Sentences

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences

The HTML formatted report can be found here on GitHub.