Binary-Class Classification Model for Seismic Bumps Take 2 Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

INTRODUCTION: Mining activity has always been connected with the occurrence of dangers which are commonly called mining hazards. A special case of such a threat is a seismic hazard which frequently occurs in many underground mines. Seismic hazard is the hardest detectable and predictable of natural hazards, and it is comparable to an earthquake. The complexity of seismic processes and big disproportion between the number of low-energy seismic events and the number of high-energy phenomena causes the statistical techniques to be insufficient to predict seismic hazard. Therefore, it is essential to search for new opportunities for better hazard prediction, also using machine learning methods.

In iteration Take1, we had three algorithms with high accuracy results but with dismal Kappa scores. For this iteration, we will examine the viability of using the ROC scores to rank and choose the models.

CONCLUSION: From the previous Take1 iteration, the baseline performance of the eight algorithms achieved an average accuracy of 93.11%. Three algorithms (Random Forest, Support Vector Machine, and Adaboost) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, all three algorithms turned in the identical accuracy result of 93.42%, with an identical Kappa score of 0.0. With an imbalanced dataset we have on-hand, we will need to look for another metric or another approach to evaluate the models.

From the current iteration, the baseline performance of the eight algorithms achieved an average ROC score of 71.99%. Three algorithms (Random Forest, Adaboost, and Stochastic Gradient Boosting) achieved the top three ROC scores after the first round of modeling. After a series of tuning trials, Stochastic Gradient Boosting turned in the best ROC result of 78.59, but with a dismal sensitivity score of 0.88%.

The ROC metric has given us a more viable way to evaluate the models, other than using the accuracy scores. However, with an imbalanced dataset that we have on-hand, we still need to look for another approach to further validate our modeling effort.

The HTML formatted report can be found here on GitHub.

Entrepreneurial Strategies, Part 2

In his book, Innovation and Entrepreneurship, Peter Drucker presented how innovation and entrepreneurship can be a purposeful and systematic discipline. That discipline is still as relevant to today’s business environment as when the book was published back in 1985. The book explains the challenges faced by many organizations and analyzes the opportunities which can be leveraged for success.

Drucker wrote that entrepreneurship requires two combined approaches: entrepreneurial strategies and entrepreneurial management. Entrepreneurial management is practices and policies that live internally within the enterprise. Entrepreneurial strategies, on the other hand, are practices and policies required for working with the external element, the marketplace.

Drucker further believed that there are four important and distinct entrepreneurial strategies we should be aware of. These are:

  1. Being “Fustest with the Mostest”
  2. “Hitting Them Where They Ain’t”
  3. Finding and occupying a specialized “ecological niche.”
  4. Changing the economic characteristics of a product, a market, or an industry.

These four strategies need not be mutually exclusive. A successful entrepreneur often combines two, sometimes even three elements, in one strategy.

“Hitting Them Where They Ain’t” manifests in one of the two ways: creative imitation and entrepreneurial judo.

Creative imitation describes a strategy where the entrepreneur does something somebody else has already done, but the entrepreneur makes the innovation better than the people who innovated originally. The strategy of “creative imitation” waits until somebody else has established the new market, but only “approximately.” It then goes to work and, within a short time, comes out with something similar that would greatly satisfy the customer. The creative imitation has then set the standard and takes over the market.

Like being “Fustest with the Mostest,” creative imitation is a strategy aimed at market or industry leadership, but it is much less risky. By the time the creative imitator moves, the market has been established. There is usually more demand for it than the original innovator can supply, so the creative imitator perfects and positions it. As such, creative imitation starts out with markets rather than with products, and with customers rather than with producers. It is both market-focused and market-driven.

Creative imitation does not exploit the failure of the pioneers as failure is commonly understood. On the contrary, the pioneer must be successful. But the original innovators failed to understand their success completely. This failure gives room for the creative innovator to exploit the success of others.

The strategy of creative imitation also requires a rapidly growing market. Creative imitators do not succeed by taking away customers from the pioneers who have first introduced a new product or service. Instead, they serve markets the pioneers have created but do not adequately service. Creative imitation satisfies a demand that already exists rather than creating one.

The strategy has its risks, and they are considerable. Creative imitators are easily tempted to splinter their efforts in the attempt to hedge their bets. Another danger is to misread the trend and imitate creatively what then turns out not to be the winning development in the marketplace.

Binary-Class Classification Model for Seismic Bumps Take 1 Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Seismic Bumps Data Set is a binary-class classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Mining activity has always been connected with the occurrence of dangers which are commonly called mining hazards. A special case of such a threat is a seismic hazard which frequently occurs in many underground mines. Seismic hazard is the hardest detectable and predictable of natural hazards, and it is comparable to an earthquake. The complexity of seismic processes and big disproportion between the number of low-energy seismic events and the number of high-energy phenomena causes the statistical techniques to be insufficient to predict seismic hazard. Therefore, it is essential to search for new opportunities for better hazard prediction, also using machine learning methods.

CONCLUSION: The baseline performance of the eight algorithms achieved an average accuracy of 93.11%. Three algorithms (Random Forest, Support Vector Machine, and Adaboost) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, all three algorithms turned in the identical accuracy result of 93.42%, with an identical Kappa score of 0.0.

With an imbalanced dataset we have on-hand, we will need to look for another metric or another approach to evaluate the models.

Dataset Used: Seismic Bumps Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/seismic-bumps

The HTML formatted report can be found here on GitHub.

值得付出的代價

(從我的一個喜歡與尊敬的作家,賽斯 高汀

當您將一個產品或服務帶入市場時,市場會決定它的價值。如果你不想被視為普通貨物(只靠廉價位來競爭),我們有兩條路逕:

一)通過稀缺性:這是值得的,因為它不是很多,或者我們是唯一能提供的人。

二)通過連接性:這是值得的,因為有其他人已經在使用它。

一點點或很多。

很少有能來替補,無論是因為它是很難獲得,還是因為你已經擁有了很多在使用的人。

我們不介意額外的支付,因為它是唯一,因為我們非常口渴而且沒有其他地方可以買水,因為我們認為它會增加價值,因為它是我們有限選擇的最佳選擇。就在這裡,現在,你是最好的選擇。換句話說,你是一個稀缺。

要么…

因為我們不想被遺棄在後面。因為它能讓我們與其他使用者連接,所以價值更多。

價值不光是利潤。廣泛而廉價的創新確實很有價值。然而,利潤往往有不同的微積成分,一個創造最終還是看誰認為值得付出額外的代價。

Multi-Class Classification Model for Letter Recognition Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Letter Recognition Data Set is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The objective is to identify each of many black-and-white rectangular-pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15.

CONCLUSION: The baseline performance of the eight algorithms achieved an average accuracy of 80.98%. Three algorithms (k-Nearest Neighbors, Support Vector Machine, and Extra Trees) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Support Vector Machine turned in the top result using the training data. It achieved an average accuracy of 97.37%. Using the optimized tuning parameter available, the Support Vector Machine algorithm processed the validation dataset with an accuracy of 97.46%, which was even slightly better the accuracy of the training data.

For this project, the Support Vector Machine algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

Dataset Used: Letter Recognition

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition

One potential source of performance benchmarks: https://www.kaggle.com/c/ci-letter-recognition

The HTML formatted report can be found here on GitHub.

Tactics, Part 2

In the podcast series, Seth Godin’s Startup School, Seth Godin gave a guided tour to a group of highly-motivated early-stage entrepreneurs on some of the questions they will have to dig deep and ask themselves while they build up their business. Here are my takeaways from various topics discussed in the podcast episodes.

  • How do we know when we have achieved our purposes? We achieve “our purpose” or what we came to do when we begin to dance on the edge of failure and grow. We see a void or a precipice in our path, but we keep moving forward. That is when we feel alive as people. The industrialists have, for one hundred years, brainwashed us into not believing we can do that because they do not want us to do that. They want us to need them. If we need them, we will work for cheap, and we will comply.
  • With the industrial age coming to an end, many opportunities are presenting in front of us. Still, it is easier to quit than is to stare down the abyss easier. It is easier to quit and to do that dance on the edge of failure. We need to move from one safety zone to another that is uncomfortable. In fact, our purpose is in finding a thing, which we did not think was going to work previously, that is working well enough. We can then wonder what the next thing is and keep that cycle going. The internet is making that easier than ever for everyone. The explosion we are about to see is not the explosion of industrial job creation. It is the explosion of people who figure out whether there is money involved in how to do that thing which we previously thought was scary to do.
  • Taking on a partner and splitting equity is an important business decision. Someone’s perception of what needs to be done will rarely be the same as ours, and someone’s understanding of cash is also not the same as ours. If we have a 50-50 partnership, it will not take long before somebody is annoyed at the other person. Taking on a partnership usually involves solving two problems. One is access to technical expertise, and the other is figuring out who can provide support and back-up. A partner work out best when he/she can help on both fronts.
  • Giving your partner all the equity upfront is not a good idea. It is better to phase in the equity distribution as time passes and contribution ramps up. If we can solve the technical expertise or the backup support problems without giving away equity, we should explore that option first. If possible, set up the equity distribution arrangement where, as the employees and other people are gaining shares due to the success of the business, we are also gaining shares. That way, we are not completely deluding ourselves as we go through the business growth process.

Don’t Just Do Something, Stand There

In his podcast, Akimbo [https://www.akimbo.me/], Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

We were often told, “Don’t just stand there, do something!” It could be sound advice for certain situations, but we should give more thoughts to a situation before we react.

We should be asking ourselves:

  • What is the purpose of this exchange?
  • What is the purpose of the point I am trying to make?
  • If I am teaching someone a lesson, what lesson am I trying to teach them? What would I like them to do instead?

“Who is it for and what is it for?” is the essence of design thinking. Another word, who am I seeking to change and what change am I trying to make?

“Don’t just stand there, do something” can get us into a purely reactive mode. It wastes precious attention, emotions, and effort. It can also lead to endless false starts because doing something right now makes us feel like we are making progress in solving the problem.

The alternative is to stand and pause. By standing there, we are not ignoring the situation but to acknowledge the situation. The situation exists and can be very uncomfortable. The answer is complicated, and we might not know what to do.

Instead, we should be immersing ourselves in the situation and applying plenty of empathy. Empathy means that the outcome is important enough to us, so we are willing to exercise effort to get that outcome. If we care about the outcome, we would ask this question.

The question is not “What would I do if I were you?” because I am not you. The question we should ask is “If I knew what you knew, if I wanted what you wanted, if I would have been exposed to what you had been exposed to, what story would resonate with me?”

It is about letting go of our self-satisfaction, uncertainty, and our correctness. If we can be empathetic and understand the “agency” held by the other person, we will be able to explore possibilities. Professionals do empathy on purpose, so they can work with the other person productively and help everyone get what they all wanted.

Empathy does not mean we need to like the person we are trying to empathize with. It does not mean we like the situation. It simply means that we are choosing to do what works. We choose to do what works for us that is also something fair and just for the other person. We must be willing to imagine what the other person is going through that we are not.

The opportunity we have when we serve people with empathy is to understand what they know, understand what they want, understand what they believe and then work hard to give them new information, new understanding, new insight so they can make a new decision based on new data and be able to work with us again going forward.

This idea of having the empathy expands way beyond the realm of customer service. It gets to the heart of what we think about when we think about justice. We react badly to injustice because we do not want to appear to be indecisive. We want to do something instead because standing there would mean taking a deep hard look at what caused the problem and beginning to understand what happened thoroughly.

When we are trying to solve a longer, more complicated problem, we can stand there, to breathe, to see, and to imagine what would happen if the shoe was on the other foot. It doesn’t feel good in the short run, but what it does is to open the door. It opens the door for us to be able to work with other people, to make connections that matter, to share dignity because the thing about dignity is it’s hard to take but very easy to give. When we give the other person the dignity of empathy, even when we disagree with them or don’t like them, we have made it possible to move forward.

The choice we must make is “Do we want to do something right now, something draconian and dramatic, that lets us blow off steam or feel safe at the moment but ultimately not get what we want?” Or are we willing to be professionals, to stand up and say, “Maybe we need to think about this? It is not what is going to make us feel good today, but what is going to make us feel proud of our actions in the long run.”

Multi-Class Classification Model for Letter Recognition Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Letter Recognition DataSet is a multi-class classification situation where we are trying to predict one of the several possible outcomes.

INTRODUCTION: The objective is to identify each of many black-and-white rectangular-pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15.

CONCLUSION: The baseline performance of the eight algorithms achieved an average accuracy of 79.30%. Three algorithms (Bagged CART, Random Forest, and k-Nearest Neighbors) achieved the top three accuracy scores after the first round of modeling. After a series of tuning trials, Random Forest turned in the top result using the training data. It achieved an average accuracy of 96.32%. Using the optimized tuning parameter available, the Random Forest algorithm processed the validation dataset with an accuracy of 96.45%, which was even slightly better the accuracy of the training data.

For this project, the Random Forest ensemble algorithm yielded consistently top-notch training and validation results, which warrant the additional processing required by the algorithm.

Dataset Used: Letter Recognition

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition

One potential source of performance benchmarks: https://www.kaggle.com/c/ci-letter-recognition

The HTML formatted report can be found here on GitHub.