Python Deep Learning Template v2 for Multi-Class Classification and Regression

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support modeling multi-class classification and regression problems using Python and the Kera framework.

Version 2 of the deep learning templates contain minor adjustments and corrections to the prevision version of the templates. Also, the new templates added or updated the sample code to support:

  • Manual splitting of the original dataset into training and test datasets
  • Evaluation of models using the resampling method such as k-fold cross-validation

You will find the Python deep learning template on the Machine Learning Project Templates page.

Time Series Model for USA Housing Starts Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The USA Housing Starts dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly housing starts in the US. The dataset describes a time-series of housing starts over 11 years (1965-1975) in the US, and there are 132 monthly observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 13,059. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (4, 0, 4) with the seasonal order being (1, 0, 1, 12). Furthermore, the chosen model processed the validation data with an RMSE of 6276, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: U.S. Housing Starts 1965 – 1975

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Breast Cancer Wisconsin (Original) Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Breast Cancer Wisconsin dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The dataset contains various measurements of breast tissue samples for cancer diagnosis. It contains measurements such as the thickness of the clump, the uniformity of cell size and shape, the marginal adhesion, and so on. Dr. William H. Wolberg of the University of Wisconsin Hospitals in Madison is the original provider of this dataset.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 96.92%. Two algorithms (Logistic Regression and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Gradient Boosting achieved an accuracy metric of 97.51%. By using the optimized tuning parameters, the Gradient Boosting algorithm processed the test dataset with an accuracy of 94.28%, which was just slightly below the prediction accuracy from the training data.

CONCLUSION: For this iteration, the Gradient Boosting algorithm achieved the best overall results using the training and test datasets. For this dataset, Gradient Boosting should be considered for further modeling.

Dataset Used: Breast Cancer Wisconsin (Original) Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

The HTML formatted report can be found here on GitHub.

Time Series Modeling Project Template for Python Version 2

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the ARIMA modeling and Python.

Version 2 of the time series template contains minor adjustments and corrections to the prevision version of the templates. Also, the new templates added and updated the sample code to support:

  • Random grid search via pmdarima’s auto_arima function
  • Full grid search via pmdarima’s auto_arima function
  • Updated sample code for manual grid search

You will find the Python time series template on the Machine Learning Project Templates page.

Web Scraping of O’Reilly Artificial Intelligence Conference 2019 San Jose Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The O’Reilly Artificial Intelligence (AI) Conference covers the full range of topics in leveraging the AI technologies for developing software applications and creating innovative solutions. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

https://conferences.oreilly.com/artificial-intelligence/ai-ca-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

成長的方式

(從我一個尊敬的作家,賽斯·高汀

一張清單做參考,您可以做相同或不同的事情⋯

做更多相同的事

  • 堅持原狀
  • 大聲嚷嚷

做些不同的事情

  • 改變你所做的事情
  • 提高價格
  • 降低價格
  • 做得更好
  • 講一個不同的故事
  • 服務其他客戶
  • 輸入新的細分
  • 改變工作的下游影響
  • 贏得信任
  • 做出更大的承諾
  • 做好組織
  • 獲得更好的客戶
  • 做些對某人來講是重要的工作

Python Deep Learning Template v1 for Regression

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support modeling regression problems using Python and the Keras framework.

Version 1 of the deep learning template is the first iteration and contains sample code segments for:

  • Preparing the deep learning modeling environment with a TensorFlow backend
  • Loading the data
  • Defining the Keras model
  • Fitting and evaluating the model on the dataset
  • Optimizing the model
  • Finalizing the model and making predictions

You will find the Python deep learning template on the Machine Learning Project Templates page.

Bob Lewis on IT Projects, Part 2

In the book, There’s No Such Thing as an IT Project: A Handbook for Intentional Business Change, Bob Lewis and co-author Dave Kaiser analyzed and discussed the new ways of thinking about IT and business management.

These are some of my takeaways from reading the book.

In “The New Business/IT Conversation” chapter, Bob and Dave outlined three types of business change that can benefit from better information technology. They are 1) business function optimization, 2) experience engineering, and 3) decision support.

Business function optimization is about getting the work done and done better. Experience engineering is about improving the experience everyone has when getting the work done. Decision support is about helping decision-makers make more effective decisions. Facilitating and making these types of business change happen should be the standard of competence for all IT organizations.

For IT, the question we used to ask was, what does the business want the software/system to do? The new question IT should be asking is how the business wants to do its work differently and better?

Business processes (how the product/service gets put together) and practices (the organization’s knowledge and experience) are different. They are the two poles of the continuum of how the organization gets its work done. IT should help a business figure out where on the continuum a specific business function should be placed to better design a system that can support it.

IT can help businesses better design their function only after understanding the input and output required by the business. The input and output are further influenced by six optimization factors: fixed cost, incremental cost, cycle time, throughput, quality, and excellence. It is not possible to optimize them all because there are constraints and trade-offs.

Designing external customer or internal user experience is complicated. Bob and Dave suggested IT start by setting this goal: “make their experience as un-irritating as possible.”

Finally, designing a decision support system is pointless until the organization adopts a culture of honest inquiry. Decision support systems are valuable only to the extent they reinforce this culture.

So, what can be done to address the new business/IT conversation? Fortunately, Bob and Dave have some solid suggestions laid out at the end of Chapter Two. I highly recommend the book.