（從我一個尊敬的作家，賽斯·高汀）

溝不是自己挖的。

在大多數情況下，我們處與溝裡因為那正是我們把自己放在哪裡。

行動會成為習慣，習慣會重複，因為做它們讓我們感到安全。

能使事情變得更有趣的最簡單方法，是簡單地停止重複習慣性的行為。

這通常來自對觸發器的反應。刪除那觸發器，您就可以改變習慣。

做微小的變化。做不同方法去記分。

明天是每天都會來。但是我們不必走與昨天同樣的路線。

Skip to content
# Month: February 2020

## 一次又一次又一次

## Deep Learning Regression Model for Ames Iowa Housing Prices Using TensorFlow Take 6

## Deep Learning Regression Model for Ames Iowa Housing Prices Using TensorFlow Take 5

## Time Series Model for Weekly Births in Quebec Using Python

## Deep Learning Regression Model for Allstate Claims Severity Using Python Take 7

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

## Deep Learning Regression Model for Allstate Claims Severity Using Python Take 6

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

## Seth Godin Akimbo: Sample Size

## 每個人都在盡力而為

"Professionals do work and ship art." That is my aspiration!

（從我一個尊敬的作家，賽斯·高汀）

溝不是自己挖的。

在大多數情況下，我們處與溝裡因為那正是我們把自己放在哪裡。

行動會成為習慣，習慣會重複，因為做它們讓我們感到安全。

能使事情變得更有趣的最簡單方法，是簡單地停止重複習慣性的行為。

這通常來自對觸發器的反應。刪除那觸發器，您就可以改變習慣。

做微小的變化。做不同方法去記分。

明天是每天都會來。但是我們不必走與昨天同樣的路線。

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Ames Iowa Housing Prices dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Many factors can influence a home’s purchase price. This Ames Housing dataset contains 79 explanatory variables describing every aspect of residential homes in Ames, Iowa. The goal is to predict the final price of each home.

In iteration Take1, we established the baseline mean squared error for further takes of modeling.

In iteration Take2, we converted some of the categorical variables from nominal to ordinal and observed the effects of the change.

In iteration Take3, we examined the feature selection technique of attribute importance ranking by using the Gradient Boosting algorithm. By selecting only the most important attributes, we decreased the processing time and maintained a similar level of RMSE compared to the baseline.

In iteration Take4, we examined the feature selection technique of recursive feature elimination (RFE) by using the Gradient Boosting algorithm. By selecting up to 100 attributes, we decreased the processing time and maintained a similar level of RMSE compared to the baseline.

In iteration Take5, we constructed several Multilayer Perceptron (MLP) models with one, two, and three hidden layers. We also observed how the different model architectures affect the RMSE metric.

In this Take6 iteration, we will add Dropout layers to our Multilayer Perceptron (MLP) models. We will observe how the Dropout layers affect the RMSE metric.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average RMSE of 31,172. Two algorithms (Ridge Regression and Gradient Boosting) achieved the top RMSE metrics after the first round of modeling. After a series of tuning trials, Gradient Boosting turned in the best overall result and achieved an RMSE metric of 24,165. By using the optimized parameters, the Gradient Boosting algorithm processed the test dataset with an RMSE of 21,067, which was even better than the prediction from the training data.

In iteration Take2, Gradient Boosting achieved an RMSE metric of 23,612 with the training dataset and processed the test dataset with an RMSE of 21,130. Converting the nominal variables to ordinal did not have a material impact on the prediction accuracy in either direction.

In iteration Take3, Gradient Boosting achieved an RMSE metric of 24,045 with the training dataset and processed the test dataset with an RMSE of 21,994. At the importance level of 99%, the attribute importance technique eliminated 222 of 258 total attributes. The remaining 36 attributes produced a model that achieved a comparable RMSE to the baseline model. The processing time for Take2 also reduced by 67.90% compared to the Take1 iteration.

In iteration Take4, Gradient Boosting achieved an RMSE metric of 23,825 with the training dataset and processed the test dataset with an RMSE of 21,898. The RFE technique eliminated 208 of 258 total attributes. The remaining 50 attributes produced a model that achieved a comparable RMSE to the baseline model. The processing time for Take3 also reduced by 1.8% compared to the Take1 iteration.

In iteration Take5, all models processed the test dataset and produced an RMSE near or around the 23,000 level. The two-layer model with 128 and 64 nodes (Model 2C) was able to achieve the best RMSE of 22,708 using the test dataset. All models eventually overfit, and the models with more layers overfit much faster than the simpler models.

In this Take6 iteration, all models again processed the test dataset and produced an RMSE near or around the 23,000 level. All models eventually overfit, but the Dropout layers can help by reducing overfitting.

CONCLUSION: For this iteration, the addition of Dropout layers produced similar RMSEs for all models. For this dataset, we should consider experimenting with more regularization techniques.

Dataset Used: Kaggle Competition – House Prices: Advanced Regression Techniques

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://ww2.amstat.org/publications/jse/v19n3/decock.pdf

One potential source of performance benchmarks: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

The HTML formatted report can be found here on GitHub.

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Ames Iowa Housing Prices dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Many factors can influence a home’s purchase price. This Ames Housing dataset contains 79 explanatory variables describing every aspect of residential homes in Ames, Iowa. The goal is to predict the final price of each home.

In iteration Take1, we established the baseline mean squared error for further takes of modeling.

In iteration Take2, we converted some of the categorical variables from nominal to ordinal and observed the effects of the change.

In iteration Take3, we examined the feature selection technique of attribute importance ranking by using the Gradient Boosting algorithm. By selecting only the most important attributes, we decreased the processing time and maintained a similar level of RMSE compared to the baseline.

In iteration Take4, we examined the feature selection technique of recursive feature elimination (RFE) by using the Gradient Boosting algorithm. By selecting up to 100 attributes, we decreased the processing time and maintained a similar level of RMSE compared to the baseline.

In this Take5 iteration, we will construct several Multilayer Perceptron (MLP) models with one, two, and three hidden layers. We will observe how the different model architectures affect the RMSE metric.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average RMSE of 31,172. Two algorithms (Ridge Regression and Gradient Boosting) achieved the top RMSE metrics after the first round of modeling. After a series of tuning trials, Gradient Boosting turned in the best overall result and achieved an RMSE metric of 24,165. By using the optimized parameters, the Gradient Boosting algorithm processed the test dataset with an RMSE of 21,067, which was even better than the prediction from the training data.

In iteration Take2, Gradient Boosting achieved an RMSE metric of 23,612 with the training dataset and processed the test dataset with an RMSE of 21,130. Converting the nominal variables to ordinal did not have a material impact on the prediction accuracy in either direction.

In iteration Take3, Gradient Boosting achieved an RMSE metric of 24,045 with the training dataset and processed the test dataset with an RMSE of 21,994. At the importance level of 99%, the attribute importance technique eliminated 222 of 258 total attributes. The remaining 36 attributes produced a model that achieved a comparable RMSE to the baseline model. The processing time for Take2 also reduced by 67.90% compared to the Take1 iteration.

In iteration Take4, Gradient Boosting achieved an RMSE metric of 23,825 with the training dataset and processed the test dataset with an RMSE of 21,898. The RFE technique eliminated 208 of 258 total attributes. The remaining 50 attributes produced a model that achieved a comparable RMSE to the baseline model. The processing time for Take3 also reduced by 1.8% compared to the Take1 iteration.

In this Take5 iteration, all models processed the test dataset and produced an RMSE near or around the 23,000 level. The two-layer model with 128 and 64 nodes (Model 2C) was able to achieve the best RMSE of 22,708 using the test dataset. All models eventually overfit, and the models with more layers overfit much faster than the simpler models.

CONCLUSION: For this iteration, the different model architectures produced similar RMSEs. For this dataset, we should consider experimenting with more regularization techniques.

Dataset Used: Kaggle Competition – House Prices: Advanced Regression Techniques

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://ww2.amstat.org/publications/jse/v19n3/decock.pdf

One potential source of performance benchmarks: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

The HTML formatted report can be found here on GitHub.

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Daily Births in Quebec dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the weekly number of births in the province of Quebec, Canada. The dataset describes a time-series of baby births for 14 years (1977-1990), and there are 5113 daily observations. To avoid out-of-memory issues during the processing, we first summarized the daily data into 730 weekly sums. We subsequently used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 70. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (2, 1, 2) with the seasonal order being (1, 0, 2, 52). Furthermore, the chosen model processed the validation data with an RMSE of 59, which was better than the baseline model as expected.

Dataset Used: Monthly Sunspot Number in Zurich, January 1749 through December 1983

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Allstate Claims Severity dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Allstate is interested in developing automated methods of predicting the cost, and hence severity, of claims. In this Kaggle challenge, the contestants were asked to create an algorithm that could accurately predict claims severity. Each row in this dataset represents an insurance claim. The task is to predict the value for the ‘loss’ column. Variables prefaced with ‘cat’ are categorical, while those prefaced with ‘cont’ are continuous.

In iteration Take1, we constructed machine learning models using the original dataset and with minimum data preparation and no feature engineering. The XGBoost model serves as the baseline for the future iterations of modeling.

In iteration Take2, we tuned additional parameters of the XGBoost model and improved the MAE metric further.

In iteration Take3, we constructed several basic Multilayer Perceptron (MLP) models with one hidden layer. The basic MLP model serves as the baseline model as we build more complex MLP models in future iterations.

In iteration Take4, we constructed several Multilayer Perceptron (MLP) models with two hidden layers. We also observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one hidden layer.

In iteration Take5, we constructed several Multilayer Perceptron (MLP) models with three hidden layers. We also observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one or two hidden layers.

In iteration Take6, we constructed several three-layer Multilayer Perceptron (MLP) models with batch normalization. We also observed whether the batch normalization technique has a positive effect on MAE when compared to models without the batch normalization.

In this iteration, we will tune the MLP model that has 512/128/64 nodes and 0.25/0.25/0.25 Dropout ratios. We will perform a grid search for the most optimized model using different learning rates, kernel initializers, and batch sizes.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average MAE of 1301. eXtreme Gradient Boosting (XGBoost) achieved the top MAE metric after the first round of modeling. After a series of tuning trials, XGBoost achieved an MAE metric of 1199. By using the optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1204, which was in line with the MAE prediction from the training data.

In iteration Take2, the further-tuned eXtreme Gradient Boosting (XGBoost) model achieved an improved MAE metric of 1191 using the training data. By using the same optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1195, which was in line with the MAE prediction from the training data.

In iteration Take3, the simple MLP model with 128 nodes achieved an MAE metric of 1193 on the test dataset after 50 epochs. The MLP model with 1024 nodes processed the same test dataset with an MAE of 1170 after the same number of epochs but with a much larger over-fitting.

In iteration Take4, the MLP model with 128/64 nodes and 0.25/0.25 Dropout ratios achieved an MAE metric of 1169 on the test dataset after 31 epochs. The MLP model with 256/128 nodes and 0.25/0.50 Dropout ratios also processed the same test dataset with an MAE of 1169 after 25 epochs.

In iteration Take5, the MLP model with 512/128/64 nodes and 0.25/0.50/0.50 Dropout ratios achieved an MAE metric of 1164 on the test dataset after 16 epochs. The MLP model with 1024/512/256 nodes and 0.25/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1164 after nine epochs.

In iteration Take6, the MLP model with 512/128/64 nodes and 0.25/0.25/0.25 Dropout ratios achieved an MAE metric of 1157 on the test dataset after 22 epochs. The MLP model with 1024/512/256 nodes and 0.50/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1159 after 48 epochs.

In this Take7 iteration, the models with the learning rate of 0.0005 seemed to produce the most stable training and testing loss curves. Those models also achieved the MAEs between 1158-1161 for the testing dataset around 20 epochs before they started to overfit.

CONCLUSION: For this iteration, the 512/128/64 nodes and 0.25/0.25/0.25 Dropout MLP model achieved good overall results using the learning rate of 0.0005. For this dataset, we should consider using this model for further modeling activities or production uses.

Dataset Used: Allstate Claims Severity Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/allstate-claims-severity/data

One potential source of performance benchmarks: https://www.kaggle.com/c/allstate-claims-severity/leaderboard

The HTML formatted report can be found here on GitHub.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Allstate Claims Severity dataset is a regression situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Allstate is interested in developing automated methods of predicting the cost, and hence severity, of claims. In this Kaggle challenge, the contestants were asked to create an algorithm that could accurately predict claims severity. Each row in this dataset represents an insurance claim. The task is to predict the value for the ‘loss’ column. Variables prefaced with ‘cat’ are categorical, while those prefaced with ‘cont’ are continuous.

In iteration Take1, we constructed machine learning models using the original dataset and with minimum data preparation and no feature engineering. The XGBoost model serves as the baseline for the future iterations of modeling.

In iteration Take2, we tuned additional parameters of the XGBoost model and improved the MAE metric further.

In iteration Take3, we constructed several basic Multilayer Perceptron (MLP) models with one hidden layer. The basic MLP model serves as the baseline model as we build more complex MLP models in future iterations.

In iteration Take4, we constructed several Multilayer Perceptron (MLP) models with two hidden layers. We observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one hidden layer.

In iteration Take5, we constructed several Multilayer Perceptron (MLP) models with three hidden layers. We observed whether the additional hidden layer has a positive effect on MAE when compared to models that have just one or two hidden layers.

In this iteration, we will construct several three-layer Multilayer Perceptron (MLP) models with batch normalization. We will observe whether the batch normalization technique has a positive effect on MAE when compared to models without the batch normalization.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average MAE of 1301. eXtreme Gradient Boosting (XGBoost) achieved the top MAE metric after the first round of modeling. After a series of tuning trials, XGBoost achieved an MAE metric of 1199. By using the optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1204, which was in line with the MAE prediction from the training data.

In iteration Take2, the further-tuned eXtreme Gradient Boosting (XGBoost) model achieved an improved MAE metric of 1191 using the training data. By using the same optimized parameters, the XGBoost algorithm processed the test dataset with an MAE of 1195, which was in line with the MAE prediction from the training data.

In iteration Take3, the simple MLP model with 128 nodes achieved an MAE metric of 1193 on the test dataset after 50 epochs. The MLP model with 1024 nodes processed the same test dataset with an MAE of 1170 after the same number of epochs but with a much larger over-fitting.

In iteration Take4, the MLP model with 128/64 nodes and 0.25/0.25 Dropout ratios achieved an MAE metric of 1169 on the test dataset after 31 epochs. The MLP model with 256/128 nodes and 0.25/0.50 Dropout ratios also processed the same test dataset with an MAE of 1169 after 25 epochs.

In iteration Take5, the MLP model with 512/128/64 nodes and 0.25/0.50/0.50 Dropout ratios achieved an MAE metric of 1164 on the test dataset after 16 epochs. The MLP model with 1024/512/256 nodes and 0.25/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1164 after nine epochs.

In this Take6 iteration, the MLP model with 512/128/64 nodes and 0.25/0.25/0.25 Dropout ratios achieved an MAE metric of 1157 on the test dataset after 22 epochs. The MLP model with 1024/512/256 nodes and 0.50/0.50/0.50 Dropout ratios also processed the same test dataset with an MAE of 1159 after 48 epochs.

CONCLUSION: For this iteration, the 512/128/64 nodes and 0.25/0.25/0.25 Dropout MLP model achieved good overall results using the training and testing datasets. It is a model that is simpler than the model that has 1024/512/256 nodes and 0.50/0.50/0.50 Dropout. For this dataset, we should consider further tuning the hyperparameters for the 512/128/64 nodes with 0.25/0.25/0.25 Dropout MLP model.

Dataset Used: Allstate Claims Severity Data Set

Dataset ML Model: Regression with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/allstate-claims-severity/data

One potential source of performance benchmarks: https://www.kaggle.com/c/allstate-claims-severity/leaderboard

The HTML formatted report can be found here on GitHub.

In his Akimbo podcast, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

In this podcast, Seth discussed the trap of false correlation that we often fall into and not realizing.

We fall into the trap of false correlation when we choose to look at two correlated events but fail to understand whether there is a cause that links to the correlation.

One of Abraham Wald’s research projects taught us an important lesson on the survivors’ bias. When we look at only the data available to us, we might miss an even bigger picture if we overlook the data that might not be visible to us. The survivors do not have the problem, so we need to look at the ones that do.

If we base our decisions purely on correlating the data, we might put too much credit on the idea of spurious correlation. Attempting to understand the causation is important because we cannot make good choices going forward if we are simply modeling data. This distinction becomes critically important when we begin to scale up artificial intelligence and machine learning.

AI has made great strides in solving many problems, such as image recognition or process automation. Those problems lend themselves well to computer processing because there is a definitive correct answer. If we feed the system enough data with the right answers, the computer can begin to predict and match up to the right answers over time.

For many other problems, such as predicting social outcomes, they are still difficult problems for AI. For those problems, the best AI can do is processing the data and offering a probability for prediction.

But we can fall into a trap if we act like statistics and probability are truth. Statistics simply tells us the range of what we can expect to happen, not why or how it will happen in any given moment. What we need is true understanding.

After Abraham Wald died, Ronald Fisher, one of the other great statisticians of the 20th century, attacked his work. Fisher criticized Wald’s work on the design of experiments, alleging ignorance of the basic ideas of the subject. Other scholars had subsequently defended Wald’s work.

Towards the latter part of his life in 1950, Ronald Fisher made a tragic error in that he spoke out against a UNESCO study that showed that people of all different races and backgrounds had the potential to do any sort of work. Fisher confused because he was looking at false correlation, not at understanding why. Fisher believed that evidence and everyday experience showed that human groups differ profoundly “in their innate capacity for intellectual and emotional development.”

Correlation is not causation, and numerous factors influence whether something is going to happen. The hard work of statistics is not to do a test and not to make sure we have the right sample size. The hard work of statistics is to understand the truth.

If we cannot understand, we are going to get seduced by the seemingly accurate predictions of artificial intelligence. The data might seduce us into believing that the future of something is going to look like the past of something. We just might write off populations of people simply because, in the past, other factors prevented them from doing the work. We can do better than this.

（從我一個尊敬的作家，賽斯·高汀）

如果實際上不是真的話那該怎麼辦？如果我們考慮在每個項目的每時每刻，實際上並沒有人在盡力而為，這種可能會更有用。

因為每個人總是需要保留一點儲備。

因為每個人總是有相互競爭的優先事項。

因為每個人的腦海裡都有噪音。

因為每個人都有恐懼，有百百種的恐懼。

因為沒有人真正能完成百分之百的準備和承諾的工作，至少在這個確切的時刻還沒有。

我沒有盡力，你也不是。因為我們不是電腦，我們只是凡人。

好吧，現在我們可以看到沒有人能盡到最大的努力，那我們有什麼其它的選擇？