Multi-Class Model for Kaggle Tabular Playground Series 2021 June Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground June 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a logarithmic loss of 1.7691. When we processed the test dataset with the final model, the model achieved a logarithmic loss of 1.7686.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 June Data Set

Dataset ML Model: Multi-Class classification with numeric attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-jun-2021/

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-jun-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Univariate Time Series MLP Modeling Template Using TensorFlow Version 1

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the TensorFlow framework and Python.

Version 1 of the TensorFlow MLP time series template replicates Dr. Brownlee’s blog post “Deep Learning Models for Univariate Time Series Forecasting”. I plan to build a script for modeling future projects by adapting the example workflow presented in the blog.

You will find the Python time series template on the Analytics Project Templates page.

Multi-Class Model for Kaggle Tabular Playground Series 2021 May Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground May 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a logarithmic loss of 1.0984. When we processed the test dataset with the final model, the model achieved a logarithmic loss of 1.1023.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 May Data Set

Dataset ML Model: Multi-Class classification with categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-may-2021/

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-may-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Kaggle Tabular Playground Series 2021 March Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground Mar 2021 dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset may be synthetic but is based on a real dataset and generated using a CTGAN. The original dataset tries to predict the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with a ROC/AUC score of 88.06%. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 87.82%.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 March Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-mar-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-mar-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Seth Godin on Survival Is Not Enough, Part 7

In his book, Survival Is Not Enough: Why Smart Companies Abandon Worry and Embrace Change, Seth Godin discusses how innovative organizations and individuals can apply prudent strategies in adapting and positioning themselves for the constant changes.

These are some of my favorite concepts and takeaways from reading the book.

Chapter 7, Serfs, Farmers, Hunters, and Wizards

In this chapter, Seth discusses how different employees can create different sorts of change within an organization. He offers the following observations and recommendations for us to think about:

  • There are four types of people in most organizations:
    • Serfs: They do what they are told.
    • Farmers: They work within the bounds of a winning strategy but constantly use feedback loops to improve the efficiency of their efforts.
    • Hunters: They look for means to expand the company’s winning strategy in ways that the organization probably had not considered before.
    • Wizards: They introduce significant mutations into the company’s mDNA, thus creating opportunities for entirely new winning strategies.
  • Farming, hunting, and wizardry all represent different ways in which zooming organizations can evolve.
  • Many people want to be serfs in a company, and many companies are eager to hire serfs. Our genes drive us to work in a steady job that insulates us from many external changes. Companies hire serfs because the machine-centric view of the enterprise demands people to be compliant cogs. For companies trying to evolve, a large number of serfs is perhaps the most significant single impediment to change.
  • Farmers have understood for thousands of years that focusing on yield is their most important activity. Establishing the communication and follow-up mechanism that permits farmers in our organization to interact and teach others is necessary for their success.
  • Hunters need the freedom to move around and a large territory to roam and identify opportunities. While the hunters have the luxury of not depending on a piece of fixed assets of land, they have a responsibility to report to the people who rely on them for planning food supply. Hunters also need to interact with their peers so that everyone can learn better hunting techniques.
  • Wizards invent opportunities by describing how the organization can use its assets to accomplish something very different. Of course, most of the things the wizard will bring to the organization will not work. However, most organizations fall victim to technology changes by not acting on the ideas of wizards. Unless our organization knows how to zoom, even the wizard’s most excellent idea will go nowhere.

In summary:

“Change is not monolithic. Different sorts of employees create different sorts of change. One of the main reasons organizations fail to change is that they try to introduce the wrong kind of change at the wrong moment.”

你有一個調音師嗎?

(從我一個尊敬的作家,賽斯·高汀

鋼琴調音師是一項至關重要的工作,已經很少有鋼琴家自己能做這項工作。

你有誰來維護您用的工具?

也許它是一台裝有所有軟件的電腦。 您是否擁有世界一流的專業人士,他們與時俱進、技術嫻熟、富有創新精神和同理心,來確保這電腦操作良好? 還是你自己糊里糊塗來做?

如果我們只有的是平庸的工具,我們為什麼要期待能有偉大的成績?

或者也許需要調整的不是軟件或硬件。是我們的態度、我們的工作方式、或是我們考慮可能性的方式…

一個自我代理的律師可能對從客戶的角度來說是個傻瓜,但我們可能正在遭受一個無法能好好利用工具的負擔。

Tabular Data Analytics Project Templates Using Python and AutoKeras Version 3

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the AutoKeras library.

Version 3 of the AutoKeras templates contain updated structures and code like the previous templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.

Multi-Label Tabular Data Classification Analytics Project Template Using TensorFlow Version 2

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling multi-label classification problems using Python and the TensorFlow library.

Version 2 of the TensorFlow templates contain updated structures and code like the previous multi-label classification TensorFlow templates. I designed the templates to address multi-class and multi-label modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.