Binary-Class Tabular Model for Kaggle Tabular Playground 2022 August Using Python and XGBoost

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground 2022 August dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. This data represents the results of an extensive product testing study. For each product code, the research team provided several product attributes and measurement values for each product, representing various lab testing methods.

Each product is used in a simulated real-world environment experiment and absorbs a certain amount of fluid to see whether it fails. The project task is to use the data to predict individual product failures of new codes with their lab test results.

ANALYSIS: The performance of the preliminary XGBoost model achieved a ROC_AUC benchmark of 0.5716. After a series of tuning trials, the final model processed the training dataset with a ROC_AUC score of 0.5761. When we processed the test dataset with the final model, the model achieved a ROC_AUC score of 0.5755.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2022 August

Dataset ML Model: Binary classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022

One source of potential performance benchmarks: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/leaderboard

The HTML formatted report can be found here on GitHub.

Binary-Class Tabular Model for Kaggle Tabular Playground 2022 August Using Python and TensorFlow Decision Forests

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground 2022 August dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. This data represents the results of an extensive product testing study. For each product code, the research team provided several product attributes and several measurement values for each product, representing various lab testing methods.

Each product is used in a simulated real-world environment experiment and absorbs a certain amount of fluid to see whether it fails. The project task is to use the data to predict individual product failures of new codes with their lab test results.

ANALYSIS: The performance of the preliminary Random Forest model achieved a ROC_AUC benchmark of 0.9703 on the training dataset. When we applied the finalized model to the test dataset, the model achieved a ROC_AUC score of 0.5473.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2022 August

Dataset ML Model: Binary classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022

One source of potential performance benchmarks: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/leaderboard

The HTML formatted report can be found here on GitHub.

Binary-Class Tabular Model for Kaggle Tabular Playground 2022 August Using Python and Scikit-Learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground 2022 August dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. This data represents the results of an extensive product testing study. For each product code, the research team provided several product attributes and measurement values for each product, representing various lab testing methods.

Each product is used in a simulated real-world environment experiment and absorbs a certain amount of fluid to see whether it fails. The project task is to use the data to predict individual product failures of new codes with their lab test results.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC_AUC benchmark of 0.5449 using the training dataset. Furthermore, we selected Logistic Regression as the final model as it processed the training dataset with a final ROC_AUC score of 0.5853. When we processed the test dataset with the final model, the model achieved a ROC_AUC score of 0.5714.

CONCLUSION: In this iteration, the Logistic Regression model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2022 August

Dataset ML Model: Binary classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022

One source of potential performance benchmarks: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/leaderboard

The HTML formatted report can be found here on GitHub.

Steven Pressfield on Put Your Ass Where Your Heart Wants to Be, Part 2

In the book, Put Your Ass Where Your Heart Wants to Be, Steven Pressfield shares his inspiration and techniques to help us make the life-altering transformation.

These are some of my favorite takeaways from reading the book.

Magic can happen when we move to the same physical location with other dreamers who have already put their asses there.

These dreamers are our peers and fellow aspirants. We get to discuss techniques and trade notes with them. None of the group synergies could happen if we all stayed home.

Another reason to consider relocating is to where our dream work needs to happen. When we move our material ass to the geographic site of our dreams, our peers and potential mentors might think, This person is serious. She has committed. She has burned the boats. She is one of us.

How does success really happen? Keep working.

Reason One: “Working means you’re getting paid. Every buck means you’re a working pro.” This also means we are toiling in our chosen field.

Reason Two: “When you work, you learn. Everybody has something to teach you.” Everyone we meet in our chosen field will likely have something we can learn from.

Reason Three: “You’re making friends.” We never know who we work with today might be a connection to something even more significant or meaningful down the road.

But before this network of friends and mentors can help us up the ladder, we must be there where they are.

“你有個計劃嗎?”

(從我一個尊敬的作家,賽斯·高汀

首先,讓我們承認這已經存在問題。

可能我認為我們正面臨一些嚴重的、代價高昂的、緊迫的事情,而你沒有這麼認為。

我們可以就問題來進行誠實的對談,而不必擔心是否有簡單或確定的解決方案。

我們還可以就這是否是一個問題(問題有解決方案)或者它是否只是一種情況來進行對話,比如我們必須忍受的重力。

一旦我們同意我們有問題,現狀就會出示自己。它會用渾身解數來爭論,說當前路徑的任何變化都太冒險、太昂貴和太痛苦,無法考慮。現狀將停滯不前。它會支持更進一步的研究,並會放大一些在我們努力為創造更好的事情時所帶來的痛苦。

現狀通常會獲勝。那是因為變革的製造者現在是做防守的角色,被迫為每一個選擇來辯護並減輕每一個不便。

也許還有一個更有用的前進方式。

我們首先在存在的問題先做出同意。

然後每一方,每一個人,都需要提出一個計劃。解決問題或對不解決問題來負責的計劃。

對於每個計劃,我們都可以考慮可能的結果。對於每個計劃,我們都可以問:“這行得通嗎?”並跟進,“為什麼?”如何?”

也許你不認為這是一個值得解決的問題。在我們詢問您是否有計劃之前提出這一點很重要。

延遲可能是最好的選擇。但是,讓我們誠實地宣布這一點,而不是簡單地去拖延。

Web Scraping of Machine Learning Mastery Articles Using Python and BeautifulSoup

SUMMARY: This project aims to practice web scraping by extracting specific pieces of information from a website. The web scraping Python code leverages the BeautifulSoup module.

Dr. Jason Brownlee’s Machine Learning Mastery hosts its tutorial lessons at https://machinelearningmastery.com/blog. The purpose of this exercise is to practice web scraping by gathering the blog entries from Machine Learning Mastery’s web pages. This iteration of the script automatically traverses the web pages to capture all articles and store the captured information in a CSV output file for sorting and filtering.

Starting URL: https://machinelearningmastery.com/blog

The source code and HTML output can be found here on GitHub.

Quantitative Finance Model using Donadio and Ghosh Learn Algorithmic Trading Chapter 4 Pairs Correlation Trading Example

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This script aims to replicate the Pairs Correlation Trading example found in chapter four of the book Learn Algorithmic Trading by Sebastien Donadio and Sourav Ghosh. The script seeks to validate the Python environment and package requirements for running these code examples successfully. The eventual goal is to integrate various example code segments from the book into an end-to-end algorithmic trading system.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Sharadar US Equities and Fund Prices from Quandl/Nasdaq Data Link

Source and Further Discussion of the Code Examples: https://github.com/PacktPublishing/Learn-Algorithmic-Trading

The HTML formatted report can be found here on GitHub.

Quantitative Finance Model using Donadio and Ghosh Learn Algorithmic Trading Chapter 4 Pairs Correlation Hypothetical Example

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template.

INTRODUCTION: This script aims to replicate the Pairs Correlation Hypothetical example found in chapter four of the book Learn Algorithmic Trading by Sebastien Donadio and Sourav Ghosh. The script seeks to validate the Python environment and package requirements for running these code examples successfully. The eventual goal is to integrate various example code segments from the book into an end-to-end algorithmic trading system.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Sharadar US Equities and Fund Prices from Quandl/Nasdaq Data Link

Source and Further Discussion of the Code Examples: https://github.com/PacktPublishing/Learn-Algorithmic-Trading

The HTML formatted report can be found here on GitHub.