Synecdoche

In his podcast, Akimbo, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

In this podcast, Seth discussed the concept of “synecdoche,” a figure of speech in which a term for a part of something refers to the whole.

Almost everything in our economic model is based on the rational actor concept. The concept asserts that individuals, with adequate information and seeking to maximize their interests, will make decisions according to the laws of economics.

Similarly, the decision-making model for an organization looks like this:

  1. When faced with a crisis, organizational leaders break down the problem and assign each part to the bureaucracy that already exists.
  2. Because of time and resource limitations, organizational leaders often settle on the first proposal that adequately addresses the issue, rather than evaluating all possible courses of action to see which one is most likely to work. This term is called “satisficing.”
  3. Organizational leaders gravitate towards solutions that limit short term uncertainty.
  4. Organizations often follow set repertoires and procedures when taking actions, rather than developing a new repertoire for the problem.
  5. Because of the large resource and time required to plan and mobilize actions within a large organization fully, organizational leaders effectively limit their actions to pre-existing plans.

The above model, and especially the last point, is the reason why organizations fail. When the world changes too much and too fast, the organization’s pre-existing methods no longer work because they need a new playbook.

As behavioral economics has pointed, we do not make a decision solely based on the rational actor model.

Just like organizations, we do not have just one voice inside of our head that is a rational actor. We also have many bureaucracies inside, all working at the same time with stereotypes and shorthands. We are moved by fear or greed, by the means we have seen before, and by the dreams we have for tomorrow. We get manipulated because we do not have a little person inside of us who is a rational, thoughtful, and long-term actor.

What this means is that our decision-making process like this:

  1. When faced with a crisis, we break it down and assign it to pre-assigned roles in our head.
  2. Because of time and resource limitations, we settle on the first proposal that adequately addresses the issue. Another word, we satisfice almost all the time.
  3. We gravitate towards solutions that limit short term uncertainty. That is behavioral economics in a nutshell.
  4. We follow set repertoires and procedures when taking actions. We often find ourselves in a rut and doing it the same way. That is called a habit.
  5. Because of the large resources, time, and fear required to mobilize actions fully, we essentially limit ourselves to pre-existing habits.

In summary, we are not rational actors as we hope. Inside of us is an organization with a process and many voices. Those voices in our head are busy doing their jobs, and their actions get displayed on the outside. We, like an organization, are just the sum of all of our voices inside our head.

Binary Classification Model for Parkinson’s Disease Using R Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. Parkinson’s Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel /a/ was collected from each subject with three repetitions.

In the first iteration, the script focused on evaluating various machine learning algorithms and identifying the model that produces the best overall metrics. The first iteration established the performance baseline for accuracy and processing time.

In iteration Take2, we will examine the feature selection technique of attribute importance ranking by using the Gradient Boosting algorithm. By selecting only the most important attributes, we hoped to decrease the processing time and maintain a similar level of prediction accuracy compared to the first iteration.

ANALYSIS: In the first iteration, the baseline performance of the machine learning algorithms achieved an average accuracy of 77.84%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 88.24%. By using the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 83.63%, which was just slightly below the prediction accuracy using the training data.

In iteration Take2, the baseline performance of the machine learning algorithms achieved an average accuracy of 82.14%. Two algorithms (Random Forest and Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Gradient Boosting turned in the top overall result and achieved an accuracy metric of 88.92%. By using the optimized parameters, the Gradient Boosting algorithm processed the testing dataset with an accuracy of 88.05%, which was just slightly below the prediction accuracy using the training data.

From the model-building perspective, the number of attributes decreased by 541, from 753 down to 212. The processing time went from 2 hours 16 minutes in the first iteration down to 27 minutes in Take2, which was a decrease of 80.1%.

CONCLUSION: For this iteration, using the Attribute Importance Ranking technique and the Gradient Boosting algorithm achieved the best overall modeling results. Using feature selection technique further reduced the processing time while achieving an even better prediction accuracy overall. For this dataset, Gradient Boosting should be considered for further modeling or production use.

Dataset Used: Parkinson’s Disease Classification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E. and Apaydin, H., 2018. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, DOI: https://doi.org/10.1016/j.asoc.2018.10.022

The HTML formatted report can be found here on GitHub.

Web Scraping of AWS Documentation using BeautifulSoup

SUMMARY: The purpose of this project is to practice web scraping by extracting specific information from a website. Using the extracted information, the script further completes other tasks (downloading files in this case). The web scraping python code leverages the BeautifulSoup module.

INTRODUCTION: On occasions, there is a need to download a batch of documents off web pages without clicking on the download links one at a time. This web scraping script will automatically traverse through the necessary web pages and collect all links with the PDF document format. The script will also download the PDF documents as part of the scraping process.

For this script to work, it requires the use of Selenium browser automation software and one of its WebDrivers (Firefox in this case).

Starting URLs: https://docs.aws.amazon.com/

The source code and JSON output can be found here on GitHub.

讓周圍變得更好

(從我的一個喜歡與尊敬的作家,賽斯 高汀

我已經意識到這句話對有些人來說是一個有爭議的聲明。

因為這似乎有暗示兩個問題:

(一)“更好”會暗示著我們現在所擁有的東西是不完美的。要“更好”就需要改變,改變是蠻可怕的。“更好”的可能性是由旁觀者的眼睛來決定。“更好”是一種斷言,是一種不僅需要有信心說出來,而且樂觀地認為它是有可能的。

(二)“變得”意味著它取決於我們。有人需要讓某人 “變得”更好,那某人可能就是你。事實上,如果你不想更好地生產,那麼你就是安於現狀的一個成員,這也是一個問題。

我經常看到我們的文化中有許多地方會難以接受這些想法。當權的人促使我們去適應,而不是去尋求改善,而否認的意念會鼓勵我們去發牢騷而不是做些改善。權力能生長當其他人中變成有被動性。

當權的人不希望看到你的眼光變得更高,不希望聽到你的不滿,也更不想被迫去不斷的升級其所有系統。因此,權力已經要我們去學這種接受,否定和倦怠的文化規範。

但是…

在我們建造的世界中的一切 – 我們喝的水,我們吃的食物,我們生活的地方 – 如果它是好的,那是因為之前有位某人,一代或兩代以前,決定讓它變得更好。如果它不好或不夠好,只有我們的行動才能讓它變得更好。

我們可以看到我們周圍的世界,如果我們去嘗試,我們可以看到某件事變得更好。

那某件事也許是播客或是政治活動,是工程洞察或更具包容性的政策。可能是涉及在類似的路徑上尋找和組織其他人。這都是需要絕對的勇氣。

我需要重申我的信念,我們每個人都有機會去提出我們的主張。宣布我們的願景,提出我們想要的改變,並努力去工作以使周圍變得更好。

現在就在我們的掌握中。

通過製作出更好的東西來改善我們的周圍。

Binary Classification Model for Parkinson’s Disease Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. Parkinson’s Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel /a/ was collected from each subject with three repetitions.

In the first iteration, the script focused on evaluating various machine learning algorithms and identifying the model that produces the best overall metrics. The first iteration established the performance baseline for accuracy and processing time.

In iteration Take2, we will examine the feature selection technique of attribute importance ranking by using the Gradient Boosting algorithm. By selecting only the most important attributes, we hoped to decrease the processing time and maintain a similar level of prediction accuracy compared to the first iteration.

ANALYSIS: In the first iteration, the baseline performance of the machine learning algorithms achieved an average accuracy of 81.58%. Two algorithms (Extra Trees and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 88.09%. By using the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 87.22%, which was just slightly below the prediction accuracy using the training data.

In iteration Take2, the baseline performance of the machine learning algorithms achieved an average accuracy of 82.97%. Two algorithms (Extra Trees and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 90.17%. By using the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 89.42%, which was just slightly below the prediction accuracy using the training data.

From the model-building perspective, the number of attributes decreased by 585, from 753 down to 168. The processing time went from 18 minutes 16 seconds in the first iteration down to 11 minutes 33 seconds in Take2, which was a decrease of 36.7%.

CONCLUSION: For this iteration, using the Attribute Importance Ranking technique and the Extra Trees algorithm achieved the best overall modeling results. Using feature selection technique further reduced the processing time while achieving an even better prediction accuracy overall. For this dataset, Extra Trees should be considered for further modeling or production use.

Dataset Used: Parkinson’s Disease Classification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E. and Apaydin, H., 2018. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, DOI: https://doi.org/10.1016/j.asoc.2018.10.022

The HTML formatted report can be found here on GitHub.

Drucker on The Educated Person, Part 2

In his book, The Essential Drucker: The Best of Sixty Years of Peter Drucker’s Essential Writings on Management, Peter Drucker analyzed the ways that management practices and principles affect the performance of organizations, individuals, and society. The book covers the basic principles of management and gives professionals the tools to perform the tasks that the environment of tomorrow will require of them.

These are my takeaways from reading the book.

Drucker thought that the knowledge workers would play a pivotal role in a post-capitalist era. In this chapter, Drucker offered his thought-provoking observations on what individuals and organizations should consider in a shift to the knowledge society.

As the transition to the knowledge society becomes the norm, the educated person will find herself getting exposure from many knowledge areas. However, Drucker doubted that we either need or will get what he called the “polymaths,” someone who is at home in many knowledge areas.

In fact, Drucker believed we would become even more specialized. But he also pointed out that the educated person in the knowledge society must have the ability to understand the various knowledge and discipline. The understanding often will come down to asking the following, potentially hard, questions about each discipline:

“What is each one about?”

“What is it trying to do?”

“What are its central concerns and theories?”

“What major new insights has it produced?”

“What are its important areas of ignorance, its problems, its challenges?”

Just as important, Drucker believed that all knowledge needs to be sustained and renewed. Without such renewal effort, the knowledge will become sterile, or worse, become intellectually arrogant and unproductive.

Moreover, major new insights in one specialized knowledge area usually arise out of another, separate specialty. Without the purposeful effort to keep a knowledge area nourished and renewed, we run the risk of reducing our opportunities of finding new insight.

The specialists of the knowledge, therefore, must take responsibility for making both themselves and their specialty understood. The specialties must make others aware that their practice is serious, rigorous, demanding discipline. This requires that the leaders in each of the knowledge areas take on the hard work of defining what it is they do and market accordingly.

In the knowledge society, there is no “queen of the knowledges,” according to Drucker. Instead, all knowledge areas are equally valuable; and, in the words of the great medieval philosopher Saint Bonaventura, lead equally to the truth. But to create such paths to truth/knowledge, we, the specialists of the knowledge, must make this work part of practicing in our discipline.

Changes are hard to predict, but Drucker believed one thing is certain. “The greatest change will be the change in knowledge—in its form and content; in its meaning; in its responsibility; and in what it means to be an educated person.”

Binary Classification Model for Parkinson’s Disease Using R

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. Parkinson’s Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel was collected from each subject with three repetitions.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 77.84%. Two algorithms (Random Forest and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in the top overall result and achieved an accuracy metric of 88.24%. By using the optimized parameters, the Random Forest algorithm processed the testing dataset with an accuracy of 83.63%, which was just slightly below the prediction accuracy using the training data.

CONCLUSION: For this iteration, the Random Forest algorithm achieved the best overall results using the training and testing datasets. For this dataset, Random Forest should be considered for further modeling or production use.

Dataset Used: Parkinson’s Disease Classification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E. and Apaydin, H., 2018. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, DOI: https://doi.org/10.1016/j.asoc.2018.10.022

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Parkinson’s Disease Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Parkinson’s Disease dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82. During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel /a/ was collected from each subject with three repetitions.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 81.58%. Two algorithms (Extra Trees and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Extra Trees turned in the top overall result and achieved an accuracy metric of 88.09%. By using the optimized parameters, the Extra Trees algorithm processed the testing dataset with an accuracy of 87.22%, which was just slightly below the prediction accuracy using the training data.

CONCLUSION: For this iteration, the Extra Trees algorithm achieved the best overall results using the training and testing datasets. For this dataset, Extra Trees should be considered for further modeling or production use.

Dataset Used: Parkinson’s Disease Classification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification

Sakar, C.O., Serbes, G., Gunduz, A., Tunc, H.C., Nizam, H., Sakar, B.E., Tutuncu, M., Aydin, T., Isenkul, M.E. and Apaydin, H., 2018. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing, DOI: https://doi.org/10.1016/j.asoc.2018.10.022

The HTML formatted report can be found here on GitHub.