Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Online News Popularity dataset is a regression situation where we are trying to predict the value of a continuous variable.
INTRODUCTION: This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the article’s popularity level in social networks. The dataset does not contain the original content, but some statistics associated with it. The original content can be publicly accessed and retrieved using the provided URLs.
ANALYSIS: The baseline performance of the model achieved a mean squared error of 13121. After tuning the hyperparameters, the best model processed the training dataset with a mean squared error of 11049. Furthermore, the final model processed the test dataset with a mean squared error of 12962, which indicated that we have a variance problem with the model. We need to gather more data or apply regularization techniques in training to narrow the variance gap before deploying the model in production.
CONCLUSION: For this dataset, the model built using Keras and TensorFlow achieved a satisfactory result and should be considered for future modeling activities.
Dataset Used: Online News Popularity Dataset
Dataset ML Model: Regression with numerical attributes
Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
The HTML formatted report can be found here on GitHub.