Web Scraping of O’Reilly Artificial Intelligence Conference 2019 London Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The O’Reilly Artificial Intelligence (AI) Conference covers the full range of topics in leveraging the AI technologies for developing software applications and creating innovative solutions. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/artificial-intelligence/ai-eu/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

Web Scraping of O’Reilly Strata Data Conference 2019 New York Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The O’Reilly Strata Data Conference brings expert-led guidance on the tools and technologies that can enhance data strategies and projects. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

Web Scraping of O’Reilly Artificial Intelligence Conference 2019 San Jose Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The O’Reilly Artificial Intelligence (AI) Conference covers the full range of topics in leveraging the AI technologies for developing software applications and creating innovative solutions. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

https://conferences.oreilly.com/artificial-intelligence/ai-ca-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

Web Scraping of O’Reilly Open Source Software Conference 2019 San Jose Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The Open Source Software Conference covers the full range of topics in leveraging the open-source movement for developing software applications and creating innovative solutions. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/oscon/oscon-or-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

Updated Machine Learning Templates v12 for R

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that can be used to support modeling ML problems using R.

Version 12 of the templates contain minor adjustments and corrections to the prevision version of the templates. Also, the new templates added or updated the sample code to support:

You will find the R templates on the Machine Learning Project Templates page.

Web Scraping of O’Reilly Velocity Conference 2019 San Jose Using R

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping R code leverages the rvest package.

INTRODUCTION: The Velocity Conference covers the full range of skills, approaches, and technologies for building and managing large-scale, cloud-native systems. This web scraping script will automatically traverse through the entire web page and collect all links to the PDF and PPTX documents. The script will also download the documents as part of the scraping process.

Starting URLs: https://conferences.oreilly.com/velocity/vl-ca-2019/public/schedule/proceedings

The source code and HTML output can be found here on GitHub.

Multi-Class Classification Model for Sensorless Drive Diagnosis Using R Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Sensorless Drive Diagnosis is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The dataset contains features extracted from electric current drive signals. The drive has both intact and defective components. The signals can result in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, such as speeds, load moments, and load forces.

In iteration Take1, we established the baseline accuracy measurement for comparison with future rounds of modeling.

In this iteration, we will standardize the numeric attributes and observe the impact of scaling on modeling accuracy.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 85.53%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 99.92%. After applying the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 99.90%, which was even better than the prediction from the training data.

In this iteration, the baseline performance of the machine learning algorithms achieved an average accuracy of 85.34%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 99.92%. After applying the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 99.90%, which was even better than the prediction from the training data.

By standardizing the dataset features, the ensemble algorithms continued to perform well. However, standardizing the features appeared to have little impact on the overall modeling accuracy.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall training and validation results. For this dataset, Random Forest could be considered for further modeling.

Dataset Used: Sensorless Drive Diagnosis Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis

The HTML formatted report can be found here on GitHub.

Multi-Class Classification Model for Sensorless Drive Diagnosis Using R Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Sensorless Drive Diagnosis is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The dataset contains features extracted from electric current drive signals. The drive has both intact and defective components. The signals can result in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, such as speeds, load moments, and load forces.

In this iteration, we will establish the baseline accuracy measurement for comparison with future rounds of modeling.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 85.53%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 99.92%. After applying the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 99.90%, which was even better than the prediction from the training data.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall training and validation results. For this dataset, Random Forest could be considered for further modeling.

Dataset Used: Sensorless Drive Diagnosis Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis

The HTML formatted report can be found here on GitHub.