Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Colorectal Cancer Histology dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
From iteration Take1, we constructed a few simple three-layer CNN’s to model the dataset. We plan to use the best result as the baseline model for future modeling iterations.
In this Take2 iteration, we will construct a simple VGG convolutional network with six VGG blocks to model the dataset. We will compare the best result from the VGG network with the baseline model from iteration Take1.
INTRODUCTION: This data set represents a collection of textures in histological images of human colorectal cancer. All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. The histological samples contain fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).
ANALYSIS: From iteration Take1, the baseline model’s performance achieved an accuracy score of 98.25% after 15 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 84.00%.
In this Take2 iteration, the baseline model’s performance achieved an accuracy score of 96.88% after 20 epochs using the training dataset. After tuning the hyperparameters, the best model processed the validation dataset with an accuracy score of 82.30%.
CONCLUSION: In this iteration, the TensorFlow CNN model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Colorectal Cancer Histology Dataset, Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)
Dataset ML Model: Multi-class image classification with numerical attributes
Dataset Reference: https://zenodo.org/record/53169#.XGZemKwzbmG
One potential source of performance benchmarks: https://www.kaggle.com/kmader/colorectal-histology-mnist
The HTML formatted report can be found here on GitHub.