SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series December 2021 dataset is a multi-class modeling situation where we are trying to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The dataset is used for this competition is synthetic but based on a real dataset and generated using a CTGAN. This dataset is based on the original Forest Cover Type Prediction competition.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with an accuracy score of 72.54%. When we processed the test dataset with the final model, the model achieved an accuracy score of 70.71%.

CONCLUSION: In this iteration, the AutoKeras model did not appear to be a suitable algorithm for modeling this dataset without using additional trial iterations.

Dataset Used: Kaggle Tabular Playground Series December 2021 Data Set

Dataset ML Model: Multi-Class classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-dec-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-dec-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground November 2021 dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The data is synthetically generated by a GAN trained on a real-world dataset used to identify spam emails via various extracted features from the email. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a ROC/AUC score of 0.7492. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.7484.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 November Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-nov-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-nov-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the TensorFlow framework and Python.

Version 1 of the TensorFlow CNN time series template replicates Dr. Brownlee’s blog post “Deep Learning Models for Univariate Time Series Forecasting”. I plan to build a script for modeling future projects by adapting the example workflow presented in the blog.

You will find the Python time series template on the Analytics Project Templates page.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground October 2021 dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the biological response of molecules given various chemical properties. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a ROC/AUC score of 0.8431. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8441.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 October Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-oct-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-oct-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series September 2021 dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on the real Titanic dataset and generated using a CTGAN. The statistical properties of this dataset are very similar to the original Titanic dataset, but there is no shortcut to cheat by using public labels for predictions.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a ROC/AUC score of 0.5907. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.5932.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 September Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-sep-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-sep-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>These are some of my favorite concepts and takeaways from reading the book.

Chapter 8, The Basic Building Block is People

In this chapter, Seth discusses how individuals can zoom within an organization or find one that would allow them to zoom. He offers the following observations and recommendations for us to think about:

- As an employee, every one of our jobs is just a stopover on a lifelong journey of personal evolution. When we move from one organization to another, we take the learning from one job to the next. Unfortunately, depending on the organization, most of the learning we bring with us will be useless at best, dangerous at worst.
- To build a zooming organization, we need to deprogram ourselves from time to time. This is because a zooming organization has a fundamentally different set of memes about how it conducts business. While it is hard to give up the winning strategy we are comfortable with, adopting the continual change that comes with zooming can help us evolve more quickly and with a greater chance of succeeding.
- For an individual who decides to zoom, it is up to the employee to find a great boss and figure out how to use the company the best possible way. The critical element is to adopt increasingly more powerful winning strategies to advance our careers.
- When the great people leave to join companies that let them zoom, runaway sets in. Those organizations can zoom ever faster, making them more fun, more stable, and more profitable over time. But this process cannot happen until individual employees choose and develop their zooming ability along with the organization.
- We may have decided to zoom, but how would we transform an organization filled with non-zoomers? How can we get everyone in the organization aligned, focused on the same tactics, and willing to take risks to find success? The answer to both questions may be surprising. Don’t.
- Do not try to force the reactionaries to change. Do not spend hours cajoling the “serfs” to give up their bondage and become farmers, hunters, and wizards. Instead, we should teach them how to think about the issue and understand the implications. Forcing people to change rarely works. Rather, be a zooming example and give them a chance to join us.
- Hiring intelligent people with self-initiative is the fastest, more efficient to evolve our organization. It is also the only way to get a runaway state. Skilled people also do not want to work for a company that drains their initiative. If we find ourselves stuck in an organization with people who only want to be the serfs, it might be necessary to look for a way out. Another word, “You’re not stuck if you don’t want to be.”

In summary:

“The most convenient carrying case for mDNA is the individual. Each individual has his own winning strategy and carries a large number of memes with him to every job and every situation.”

]]>如果你以用你賣的時間來衡量你所做的工作，那麼當輪班結束時，工作就結束了。 打卡進來，打卡出門。

如果你把你的產出作為工作的衡量標準，當收件箱是空的時，你的工作就結束了。 一旦您作好了所有訂購的比薩餅，您就算是完成了。

但越來越多，我們的工作可能是無止境的。 再打一個銷售電話可能會帶來更多的銷售。 再一個創新周期可能會導致我們一直在尋找的突破。 再寫一篇文章可能會讓您獲得流量。

在競爭激烈的市場中，自我調節輪班時間的長久是一個很大的問題。 鑑於要做的事情清單是無止境的，我們每個人必須可以決定“足夠”是什麼樣子。 因為花更多的時間並不總是最好的答案。

]]>SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Series August 2021 dataset is a regression modeling where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a logarithmic loss of 7.9117. When we processed the test dataset with the final model, the model achieved a logarithmic loss of 7.9304.

Dataset Used: Kaggle Tabular Playground Series August 2021 Dataset

Dataset ML Model: Regression with numerical features

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-aug-2021

One potential source of performance benchmarks: https://www.kaggle.com/c/tabular-playground-series-aug-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground June 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a logarithmic loss of 1.7691. When we processed the test dataset with the final model, the model achieved a logarithmic loss of 1.7686.

Dataset Used: Kaggle Tabular Playground 2021 June Data Set

Dataset ML Model: Multi-Class classification with numeric attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jun-2021/

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-jun-2021/leaderboard

The HTML formatted report can be found here on GitHub.

]]>Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the TensorFlow framework and Python.

Version 1 of the TensorFlow MLP time series template replicates Dr. Brownlee’s blog post “Deep Learning Models for Univariate Time Series Forecasting”. I plan to build a script for modeling future projects by adapting the example workflow presented in the blog.

You will find the Python time series template on the Analytics Project Templates page.

]]>