Multi-Class Image Classification Model for Colorectal Cancer Histology Using TensorFlow Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Colorectal Cancer Histology dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

From iteration Take1, we constructed a few simple three-layer CNN’s to model the dataset. We plan to use the best result as the baseline model for future modeling iterations.

In this Take2 iteration, we will construct a simple VGG convolutional network with six VGG blocks to model the dataset. We will compare the best result from the VGG network with the baseline model from iteration Take1.

INTRODUCTION: This data set represents a collection of textures in histological images of human colorectal cancer. All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. The histological samples contain fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).

ANALYSIS: From iteration Take1, the baseline model’s performance achieved an accuracy score of 98.25% after 15 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 84.00%.

In this Take2 iteration, the baseline model’s performance achieved an accuracy score of 96.88% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 82.30%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Colorectal Cancer Histology Dataset, Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://zenodo.org/record/53169#.XGZemKwzbmG

One potential source of performance benchmarks: https://www.kaggle.com/kmader/colorectal-histology-mnist

The HTML formatted report can be found here on GitHub.

Annie Duke on Thinking in Bets, Part 5

In her book, Thinking in Bets: Making Smarter Decisions When You Don’t Have All the Facts, Annie Duke draws on examples from business, sports, politics, and poker to share tools anyone can use to embrace uncertainty and make better decisions.

These are some of my favorite concepts and takeaways from reading the book.

“If it weren’t for luck, I’d win every one”

Because of our strong desire to make sense of things around us, we prefer to see strong causal relationships. When our actions and the quality of outcomes do not always correlate due to the random chance involved, we are uncomfortable with our inability to explain everything that happens to us.

Over time, we develop a “self-serving bias.” When we experience bias, we create a self-narrative by taking credit for the excellent result and blaming the poor result on luck, so it will not be our fault. When we figure out why something happened, we look for a plausible reason to fit our wishes. Usually, we want an explanation that flatters us and put us in a good light. When we knew we made an unforced error, we look for ways to minimize our bad feelings.

The trouble with the “self-serving bias” is that it makes it very hard for us to learn from our experience. However, understanding why this pattern emerges is the first step to developing practical strategies to improve our ability to learn from our experience.

“People Watching”

We often apply the same black-and-white thinking of “self-serving bias” when we judge other people’s decisions. In those cases, we flip the script where we attribute others’ success as an outcome of luck while the lousy effect as the direct result of their decisions.

These systematic errors in the way we field our peers’ outcomes and our peers come at a real cost. We inhibit ourselves in learning from our and others’ experiences. We also let ourselves off the hook easy while needlessly punish other people for the share of bad luck they might have experienced.

We need to be aware of such shortcomings in our thinking because it does not just come at the cost of reaching our goals but also at the expense of empathy and compassion for others.

“Other people’s outcomes reflect on us”

The way we feel about ourselves comes from how we think we compare with others. This thought pattern is a pervasive habit that can impedes learning. Luckily, we can change our habits through a conscious effort.

By being aware of what makes us feel good about ourselves, we can move toward a more rational fielding of our outcomes and a more compassionate view of others. We can learn more effectively if we work toward an open-minded narrative by striving toward objectivity, accuracy, and truth-seeking.

Furthermore, we should practice giving others credit when it’s due, admitting when our decisions could have been better, and acknowledging that almost nothing is binary or black-and-white.

關於沉沒成本

(從我一個尊敬的作家,賽斯·高汀

明天又是另一個機會。

那邊有三十個人正在等你來幫助他們建立聯繫,帶領他們或使事情變得更好。但是,如果您仍然在這里為卡住的項目辯護,那麼您投入的太多,您將無法為他們效力。

那些需要您的聲音或產品的客戶,合作夥伴,和學生將無法從你身上來受益,因為您正在努力的將自己從以前的漏洞中挖掘出來,這種情況會比以往更加困難來通過。

您可以輕鬆地將注意力集中在我們面前的問題上,並確定這個問題(只有這個問題)才是我們要解決的問題。但是,一切都要付出代價,而且這樣做時失去的機會是真實存在的,即使您沒有註意到也是如此。

當然,當事情變得困難時,我們不會通過從一件事過渡到另一件事來做出貢獻。但是,如果我們無法退出一個霸佔我們精力的項目,我們遲早會賣空自己(並傷害我們能夠服務的人)。

昨天發生的事情已經發生了。這是以前的你給現在的你一件禮物。如果您不想接受,您則不必接受。

NLP Model for Movie Review Sentiment Analysis Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Movie Review Sentiment Analysis dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

Additional Notes: This script is a replication, with some small modifications, of Dr. Jason Brownlee’s blog post, How to Prepare Movie Review Data for Sentiment Analysis. I plan to leverage Dr. Brownlee’s tutorial and build a TensorFlow-based text classification notebook template for future modeling of similar datasets.

In this Take1 iteration, we will construct the necessary code modules to handle the tasks of loading text, cleaning text, and vocabulary development.

INTRODUCTION: The Movie Review Data is a collection of movie reviews retrieved from the imdb.com website in the early 2000s by Bo Pang and Lillian Lee. The reviews were collected and made available as part of their research on natural language processing. The dataset comprises 1,000 positive and 1,000 negative movie reviews drawn from an archive of the rec.arts.movies.reviews newsgroup hosted at IMDB. The authors refer to this dataset as the ‘polarity dataset.’

ANALYSIS: Deep learning modeling results will be forthcoming in the future iterations.

CONCLUSION: In this Take1 iteration, we were able to construct the necessary code modules to handle the tasks of loading text, cleaning text, and vocabulary development.

Dataset Used: Movie Review Sentiment Analysis Dataset

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://www.cs.cornell.edu/home/llee/papers/cutsent.pdf and http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz

One potential source of performance benchmarks: https://machinelearningmastery.com/prepare-movie-review-data-sentiment-analysis/

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Model for Colorectal Cancer Histology Using TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Colorectal Cancer Histology dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.

In this Take1 iteration, we will construct several simple three-layer CNN’s to model the dataset. We will use the best network as the baseline model for future modeling iterations.

INTRODUCTION: This data set represents a collection of textures in histological images of human colorectal cancer. All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. The histological samples contain fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).

ANALYSIS: The baseline model’s performance achieved an accuracy score of 98.25% after 15 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 84.00%.

CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Colorectal Cancer Histology Dataset, Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)

Dataset ML Model: Multi-class image classification with numerical attributes

Dataset Reference: https://zenodo.org/record/53169#.XGZemKwzbmG

One potential source of performance benchmarks: https://www.kaggle.com/kmader/colorectal-histology-mnist

The HTML formatted report can be found here on GitHub.

Algorithmic Trading Model for Naïve Momentum Strategy Using Python

NOTE: This script is for learning purposes only and does not constitute a recommendation for buying or selling any stock mentioned in this script.

SUMMARY: This project aims to construct and test an algorithmic trading model and document the end-to-end steps using a template. We will test trading models with the naïve momentum strategy.

INTRODUCTION: This algorithmic trading model examines a simplistic naïve momentum strategy in comparison to a buy-and-hold approach. The plan goes long (buys) on the stock when the daily closing price improves from the previous day for a pre-defined consecutive number of days. Conversely, we will exit the position when the daily price declines for the same successive number of days.

ANALYSIS: From this iteration, we analyzed the stock prices for Apple Inc. (AAPL) between January 1, 2020, and November 20, 2020. The trading model produced a profit of 23.22 dollars per share. The buy-and-hold approach yielded a gain of 44.58 dollars per share.

CONCLUSION: For the stock of AAPL during the modeling time frame, the trading strategy did not produce a better return than the buy-and-hold approach. We should consider modeling this stock further by experimenting with more variations of the strategy.

Dataset ML Model: Time series analysis with numerical attributes

Dataset Used: Quandl

The HTML formatted report can be found here on GitHub.

Multi-Class Image Classification Deep Learning Template v1 using TensorFlow

As I work on practicing and solving machine learning (ML) problems, I repeatedly find myself repeating a set of steps and activities.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that I use to experiment with modeling ML problems using Python and TensorFlow.

Version 1 of the TensorFlow multi-class image classification template contains structures and features like the TensorFlow templates used for tabular data. The image classification template was designed to take a deep learning modeling exercise from beginning to end.

You will find the Python templates on the Machine Learning Project Templates page.

Binary Image Classification Deep Learning Template v1 using TensorFlow

As I work on practicing and solving machine learning (ML) problems, I repeatedly find myself repeating a set of steps and activities.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that I use to experiment with modeling ML problems using Python and TensorFlow.

Version 1 of the TensorFlow binary image classification template contains structures and features like the TensorFlow templates used for tabular data. The image classification template was designed to take a deep learning modeling exercise from beginning to end.

You will find the Python templates on the Machine Learning Project Templates page.