Univariate Time Series Model for Water Utility Consumers Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Water Utility Consumers dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of water utility consumers in London, United Kingdom. The dataset describes a time series of utility accounts over 11 years (1983-1994), and there are 216 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 2828. The CNN model processed the same test data with an RMSE of 2395, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (0, 1, 1) and seasonal order of (0, 0, 1, 12) processed the validation data with an RMSE of 2260.

CONCLUSION: For this dataset, the TensorFlow CNN model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Number of water consumers in London, United Kingdom, Jan 1983 through April 1994.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Modeling Template Using TensorFlow Version 1

As I work on practicing and solving machine learning (ML) problems, I repeatedly re-use a programming set of steps and activities.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the TensorFlow framework and Python.

Version 1 of the TensorFlow time series template replicates many code segments within Dr. Brownlee’s blog post “Deep Learning Models for Univariate Time Series Forecasting”. The plan is to build a script for modeling future projects by adapting the example workflow presented in the blog.

The TensorFlow time series template is on the Analytics Project Templates page.

Univariate Time Series Model for Monthly Rainfall Coppermine Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Monthly Rainfall Coppermine dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly rainfall. The dataset describes a time series of rainfall (in millimeters) over 44 years (1933-1976), and there are 528 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 12.234. The CNN model processed the same test data with an RMSE of 12.239, which was comparable to the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (0, 0, 2) and seasonal order of (1, 0, 1, 12) processed the validation data with an RMSE of 10.73.

CONCLUSION: For this dataset, the TensorFlow CNN model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly Rainfall Coppermine 1933 through 1976.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Annual Immigration into USA Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Annual Immigration into USA dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the annual number of people immigrating to the United States. The dataset describes a time-series of people (in thousands) over 143 years (1820-1962), and there are 143 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 52,116. The LSTM model processed the same test data with an RMSE of 38,031, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (0, 1, 2) processed the validation data with an RMSE of 61,789.

CONCLUSION: For this dataset, the TensorFlow LSTM model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Annual immigration into the United States, 1820-1962.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Ozone Concentration at Arosa Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Ozone Concentration at Arosa dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly ozone concentration levels measured at Arosa, Switzerland. The dataset describes a time series of concentration levels (in Dobson unit or DU) over a 40 years period (1932-1971), and there are 480 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 21.821. The CNN-LSTM model processed the same test data with an RMSE of 18.668, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (2, 0, 4) and seasonal order of (2, 0, 1, 12) processed the validation data with an RMSE of 16.6.

CONCLUSION: For this dataset, the TensorFlow CNN-LSTM model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly Ozone Concentration at Arosa January 1932 through December 1971.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Cow Milk Production Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Ozone Concentration at Arosa dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly milk production per cow in an agriculture environment. The dataset describes a time-series of milk production (in pounds per cow) over 13 years (1962-1974), and there are 156 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 31.967. The ConvLSTM model processed the same test data with an RMSE of 29.877, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (1, 1, 1) and seasonal order of (0, 1, 1, 12) processed the validation data with an RMSE of 5.21.

CONCLUSION: For this dataset, the TensorFlow ConvLSTM model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly Cow Milk Production January 1962 through December 1975.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Iron Production in American River River-flow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The American River River-flow dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly river flow for American River at Fair Oaks, California. The dataset describes a time series of flow volume (in cms) over 55 years (1906-1960), and there are 660 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 112.477. The MLP model processed the same test data with an RMSE of 88.340, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (1, 0, 0) and seasonal order of (1, 0, 1, 12) processed the validation data with an RMSE of 78.413.

CONCLUSION: For this dataset, the TensorFlow MLP model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly river flow in cms, American River at Fair Oaks, California, October 1906 through September 1960

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Iron Production in Australia Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Iron Production in Australia dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is forecasting the monthly iron production in Australia. The dataset describes a time-series of weight (in thousand tons) over 40 years (1956-1995), and there are 476 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 56.494. The MLP model processed the same test data with an RMSE of 41.415, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (1, 1, 1) and seasonal order of (1, 0, 1, 12) processed the validation data with an RMSE of 34.639.

CONCLUSION: For this dataset, the TensorFlow MLP model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Monthly basic iron production in Australia January 1956 through August 1995

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.