Time Series Model for Chicago Live Births in the USA Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. Live Births in the United States dataset is a time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The United Nations Statistics Division collects, compiles, and disseminates official demographic and social statistics on various topics. The Demographic Yearbook provides statistics on population size and composition, births, deaths, marriage, and divorce rates annually. The problem is to forecast the monthly number of live births in the United States. The dataset describes a time-series of individuals over 47 years (1969-2015), and there are 564 observations. We used the first 90% of the instances for training various models while holding back the remaining data for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 16735. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (3, 1, 4) with the seasonal order (2, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 7177, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: Live births by month of birth | Demographic Statistics Database | United Nations Statistics Division

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://data.un.org/Data.aspx?d=POP&f=tableCode:55

The HTML formatted report can be found here on GitHub.

Time Series Model for Chicago Fed Hiring Expectations Survey Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Chicago Fed Hiring Expectations Survey dataset is a time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of business condition surveys conducted by the Chicago Federal Reserve for the area of hiring expectations in the next 12 months. The dataset describes a time-series of survey calculation (between plus 40 and minus 40) for over eight years (2013-2020), and there are 93 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 12.132. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (1, 0, 1). Furthermore, the chosen model processed the validation data with an RMSE of 8.694, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: Chicago Fed Survey of Business Conditions: Hiring Expectations in the next 12 Months, January 2013 to September 2020

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: Federal Reserve Bank of Chicago, Chicago Fed Survey of Business Conditions: Hiring Expectations in the next 12 Months [CFSBCHIRINGEXP], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/CFSBCHIRINGEXP, November 4, 2020.

The HTML formatted report can be found here on GitHub.

Time Series Model for University of Michigan Inflation Expectation Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Inflation Expectation dataset from the University of Michigan is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is forecasting the monthly number of median expected price change next 12 months based on consumers’ surveys. The dataset describes a time-series of percentages over 42 years (1978-2020), and there are 512 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 0.221. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 1, 2). Furthermore, the chosen model processed the validation data with an RMSE of 0.206, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: University of Michigan: Inflation Expectation

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: University of Michigan, University of Michigan: Inflation Expectation [MICH], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MICH, October 24, 2020.

The HTML formatted report can be found here on GitHub.

Time Series Model for Birmingham Parking Occupancy Using Python and ARIMA Part 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Birmingham Parking Occupancy dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the hourly number of parking occupancy for a parking facility in Birmingham. The dataset describes a time-series of parking occupancy over three months between October 2016 and December 2016, and there are 1834 hourly observations. We used the first 90% of the observations for training various models while holding back the remaining observations for validating the final model.

From iteration Part1, we trained and validated an ARIMA model using just one facility, BHMBCCMKT01, within the dataset.

In this Part2 iteration, we will train and validate an ARIMA model for each one of the facilities within the dataset.

ANALYSIS: The baseline prediction (or persistence) for the parking facility BHMBCCMKT01 resulted in an RMSE of 46. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 0, 1) with the seasonal order (2, 0, 0, 24). Furthermore, the chosen model processed the validation data with an RMSE of 22, which was better than the baseline model as expected.

Parking structure: BHMBCCPST01

  • RMSE for the persistent model is: 38
  • Final Non-season order: (0, 0, 1) Final Seasonal Order: (1, 0, 1, 24)
  • RMSE from the validation data is: 20

Parking structure: BHMBCCSNH01

  • RMSE for the persistent model is: 157
  • Final Non-season order: (2, 0, 1) Final Seasonal Order: (0, 0, 2, 24)
  • RMSE from the validation data is: 75

Parking structure: BHMBCCTHL01

  • RMSE for the persistent model is: 84
  • Final Non-season order: (0, 0, 0) Final Seasonal Order: (1, 0, 1, 24)
  • RMSE from the validation data is: 24

Parking structure: BHMNCPPLS01

  • RMSE for the persistent model is: 32
  • Final Non-season order: (4, 0, 0) Final Seasonal Order: (1, 0, 0, 24)
  • RMSE from the validation data is: 16

Parking structure: BHMBRCBRG02

  • RMSE for the persistent model is: 189
  • Final Non-season order: (0, 1, 3) Final Seasonal Order: (0, 0, 2, 24)
  • RMSE from the validation data is: 95

Parking structure: BHMBRCBRG03

  • RMSE for the persistent model is: 78
  • Final Non-season order: (2, 1, 0) Final Seasonal Order: (0, 0, 2, 24)
  • RMSE from the validation data is: 41

Parking structure: BHMBRTARC01

  • RMSE for the persistent model is: 109
  • Final Non-season order: (1, 0, 0) Final Seasonal Order: (1, 0, 0, 24)
  • RMSE from the validation data is: 120

Parking structure: BHMEURBRD01

  • RMSE for the persistent model is: 77
  • Final Non-season order: (1, 0, 4) Final Seasonal Order: (2, 0, 1, 24)
  • RMSE from the validation data is: 24

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using ARIMA for further modeling.

Dataset Used: Parking Birmingham Data Set

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

The HTML formatted report can be found here on GitHub.

Time Series Model for Birmingham Parking Occupancy Using Python and ARIMA Part 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a time series prediction model and document the end-to-end steps using a template. The Birmingham Parking Occupancy dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the hourly number of parking occupancy for a parking facility in Birmingham. The dataset describes a time-series of parking occupancy over three months between October 2016 and December 2016, and there are 1834 hourly observations. We used the first 90% of the observations for training various models while holding back the remaining observations for validating the final model.

In this Part 1 iteration, we will train and validate the model using just one facility, BHMBCCMKT01, within the dataset.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 46. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 0, 1) with the seasonal order (2, 0, 0, 24). Furthermore, the chosen model processed the validation data with an RMSE of 22, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using ARIMA for further modeling.

Dataset Used: Parking Birmingham Data Set

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

The HTML formatted report can be found here on GitHub.

Time Series Model for Housing Starts in the USA Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Housing Starts in the USA dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly total number of housing starts in the USA. Housing start occurs when excavation begins for the footings or foundation of a building. The dataset describes a time-series of housing starts (thousands of units) over 30 years (1959-2020), and there are 739 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 9.705. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 0, 4) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 8.763, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Housing Starts: Total: New Privately Owned Housing Units Started, U.S. Census Bureau and U.S. Department of Housing and Urban Development, Housing Starts: Total: New Privately Owned Housing Units Started [HOUSTNSA], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/HOUSTNSA, August 23, 2020.

Dataset ML Model: Time series forecast with numerical attribute

The HTML formatted report can be found here on GitHub.

Time Series Model for Private Housing Permits for California Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Private Housing Permits for California dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly total number of building permits for all structure types for the state of California. The dataset describes a time-series of permits issued over 30 years (1991-2020), and there are 354 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 2153. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (0, 1, 1) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 1486, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Monthly New Private Housing Units Authorized by Building Permits for California

Dataset ML Model: Time series forecast with numerical attribute

U.S. Census Bureau, New Private Housing Units Authorized by Building Permits for California [CABPPRIV], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/CABPPRIV, August 23, 2020.

The HTML formatted report can be found here on GitHub.

Time Series Model for Metro Bus Ridership Using Python and ARIMA

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Metro Bus Ridership dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of bus riders for the Los Angeles County Metro district. The dataset describes a time-series of bus riders between January 2009 and June 2020, and there are 138 observations. We used the first 80% of the observations for training various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 2.480 million. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 1, 2) with the seasonal order being (1, 0, 2, 12). Furthermore, the chosen model processed the validation data with an RMSE of 2.397 million, which was just slightly better than the baseline model.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result, and we should consider using the algorithm for further modeling.

Dataset Used: Metro Interactive Estimated Ridership Stats

Dataset ML Model: Time series forecast with numerical attribute

Dataset Reference: http://isotp.metro.net/MetroRidership/Index.aspx

The HTML formatted report can be found here on GitHub.