Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Fisher River Flow dataset is a time series situation where we are trying to forecast future outcomes based on past data points.
INTRODUCTION: The problem is to forecast the daily river flow volume for the Fisher River near Dallas, Texas. The dataset describes a time-series of daily volume (in cms) over four years (1988-1991), and there are 1,461 observations in the dataset. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.
ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 0.693. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (1, 0, 2) with the seasonal order being (0, 1, 0, 365). Furthermore, the chosen model processed the validation data with an RMSE of 0.987, which was no better than the baseline model.
CONCLUSION: For this iteration, the chosen ARIMA model did not achieve a satisfactory result. We should explore different sets of ARIMA parameters and conduct further modeling activities.
Dataset Used: Mean daily flow, Fisher River near Dallas, January 1, 1988 to December 31, 1991.
Dataset ML Model: Time series forecast with numerical attributes
Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.
The HTML formatted report can be found here on GitHub.