Updated Machine Learning Templates v12 for R

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that can be used to support modeling ML problems using R.

Version 12 of the templates contain minor adjustments and corrections to the prevision version of the templates. Also, the new templates added or updated the sample code to support:

You will find the R templates on the Machine Learning Project Templates page.

Web Scraping of O’Reilly Velocity Conference 2019 San Jose Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The Velocity Conference covers the full range of skills, approaches, and technologies for building and managing large-scale, cloud-native systems. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/velocity/vl-ca-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

朝著完整的堆棧

(從我一個尊敬的作家,賽斯·高汀

找到合適的客戶

贏得他們的關注和信任

找出問題所在

找出他們的恐懼,欣然接受他們的目標

用原型去發掘可能的解決方案

創建支持您解決問題的體系結構

建立最簡單並可行的解決方案

測試一下

編寫一個記錄良好,有彈性的代碼片段

再測試一下

調試它

出示它

我們很容易會被我們的專業或知識來認為,我們的工作僅僅只是堆棧中的一小部分。但這全部一切都是有相互聯繫。

Python Deep Learning Template v1 for Binary Classification

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support modeling binary classification problems using Python and the Kera framework.

Version 1 of the deep learning template is the first iteration and contains sample code segments for:

  • Preparing the deep learning modeling environment with a TensorFlow backend
  • Loading the data
  • Defining the Keras model
  • Compiling the model
  • Fitting the model on the dataset
  • Evaluating the model
  • Finalizing the model and making predictions

You will find the Python deep learning template on the Machine Learning Project Templates page.

Drucker on Managing Oneself, Part 3

In his book, Management Challenges for the 21st Century, Peter Drucker analyzed and discussed the new paradigms of management.

Although much of the discussion revolves around the perspective of the organization, these are my takeaways on how we can apply his teaching on our journey of being a knowledge worker.

For knowledge workers, understanding the factors that influence our performance is just as important as understanding our strengths. Like our strengths, how we perform is also individualized. Another word, our personality plays a major part in determining how we perform.

Drucker suggested we explore three questions in the quest of understanding how we perform.

  1. How do I perceive information?
  2. How do I learn?
  3. What are my values?

Am I a reader or a listener? We perceive information in different ways, and understanding our preference is crucial in being effective at what we do. The distinction between the reader and the listener is even more critical when it comes to our decision-making process. We should understand the difference and put ourselves in the best position possible to receive and process the information we need to make decisions.

Knowing the reader vs. listener preference is very much like knowing and working with our left-hand vs. right-hand preference. If we can work with our preferences, we get a better chance to amplify our effectiveness. When we work against our preferences, we stand to lose or even destroying our effectiveness.

The second thing to know how we perform is to know how we learn. There are probably several ways to learn, and, again, we will have our preferences. Some people learn by taking copious notes. Some people learn by hearing themselves talk. Some learn by doing, and some learn by reading and conceptualizing in their heads.

The above paths describe some of the ways of acquiring knowledge. There are other paths we take to learn from experience as well. Some learn better as loner, and some do better in a team setting. Some people do well under stress, and there are those who need a structured and predictable environment.

Moreover, some people perform and learn better as a decision-maker. Also, some would prefer to act as an adviser. The important thing suggested by Drucker is not to change ourselves too drastically, because that is unlikely to be successful. We should work hard to improve the way we perform and avoid putting ourselves in a situation or an environment where we will perform poorly.

Finally, Drucker reminded us that our values play the ultimate test in determining how we perform. Drucker called the values the “mirror test.” When we work in an organization with a values system that is incompatible with ours, we run a great risk of experiencing frustration and nonperformance.

Our strengths and performance are usually closely correlated. However, there is sometimes a conflict between a person’s values and the same person’s strengths. When there is a conflict between our strengths and values, we must take a close look and see where and why the discrepancies. If we do not resolve the discrepancies, we run the likely risk of low performance and low contribution.

Each one of us has something unique to offer. We all should put ourselves in the best position to perform by knowing our strengths and match them with our preferences to get the best results possible.

Time Series Model for Minimum Weekly Temperatures Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Minimum Daily Temperatures dataset is a time series situation where we are trying to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the minimum weekly temperatures In Australia. The dataset describes a time-series of temperatures over ten years (1981-1990) in the city Melbourne, Australia, and there are 3650 daily observations. The source of the data is credited as the Australian Bureau of Meteorology. We used the first 70% of the observations for training and testing various models, while holding back the remaining observations for validating the final model.

For this iteration of the machine learning modeling, we will summarize the daily temperature measurements into weekly numbers before modeling.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 1.940. After performing a grid search for the most optimal ARIMA parameters, the final ARIMA non-seasonal order was (1, 0, 2) with the seasonal order being (0, 1, 1, 52). Furthermore, the chosen model processed the validation data with an RMSE of 1.611, which was better than the baseline model as expected.

CONCLUSION: For this dataset, the chosen ARIMA model achieved a satisfactory result and should be considered for further modeling.

Dataset Used: Minimum Daily Temperatures

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: https://machinelearningmastery.com/time-series-datasets-for-machine-learning/

The HTML formatted report can be found here on GitHub.

First Comes Carriage

In his Akimbo podcast, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

In this podcast, Seth discusses the importance of having and building carriage for shipping work to our audience. Carriage is the vehicle that can carry content from one place to another. In the digital world, we need to leverage some carriage to deliver/ship our work for making changes happen.

Carol Burnett used her variety show to entertain millions of viewers every week. Carol Burnett owned an hour from 10 million people, and she used that position for years and years to entertain people in a way that she was proud of. She had a carriage which was carrying her work from the studio through the network to people’s homes on Saturday night.

Ted Turner realized that the UHF TV stations and cable channels were creating a different type of carriage. He leveraged the carriage to deliver the contents created by him to those who want to see them. Turner’s Atlanta Braves baseball team became a premiere team in the league because he leveraged his carriage to give his team exposure.

Carriages used to be controlled by a handful of gatekeepers, but the Internet has turned the rules of carriage running toward a different direction. Now everyone can produce contents and deliver those contents via the Internet, the availability of carriage became wide open.

Because the Internet is the friend of infinity, content aggregators such as YouTube and WordPress.com are no longer playing the role of gatekeeper. Another word, remarkable work does not happen because YouTube put something on their home page. Remarkable work becomes that way because many people choose to tell other people about the content they have seen.

With millions of other people now have the same access to the technologies as we do, we have an opportunity to build the carriage for us. With patience, we can build a following. With patience, we can build a permission asset with the ability to deliver anticipated personal and relevant messages to our readers who want to hear from us.

As the carriage rules keep changing, they are no longer controlled by the FCC or by someone like Ted Turner, who owns a network or cable company. For the first time, each one of us can build the carriage for doing the work we are proud of, as opposed to pandering to someone else who owns the carriage to fit into their business model.

Multi-Class Classification Model for Sensorless Drive Diagnosis Using R Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Sensorless Drive Diagnosis is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The dataset contains features extracted from electric current drive signals. The drive has both intact and defective components. The signals can result in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, such as speeds, load moments, and load forces.

In iteration Take1, we established the baseline accuracy measurement for comparison with future rounds of modeling.

In this iteration, we will standardize the numeric attributes and observe the impact of scaling on modeling accuracy.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 85.53%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 99.92%. After applying the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 99.90%, which was even better than the prediction from the training data.

In this iteration, the baseline performance of the machine learning algorithms achieved an average accuracy of 85.34%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 99.92%. After applying the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 99.90%, which was even better than the prediction from the training data.

By standardizing the dataset features, the ensemble algorithms continued to perform well. However, standardizing the features appeared to have little impact on the overall modeling accuracy.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall training and validation results. For this dataset, Random Forest could be considered for further modeling.

Dataset Used: Sensorless Drive Diagnosis Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis

The HTML formatted report can be found here on GitHub.