Binary-Class Image Classification Model for Concrete Crack Images Using Python and TensorFlow Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Concrete Crack Images dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: In this study, the research team developed a computerized vision system to recognize whether there are cracks on concrete surfaces. The dataset contains concrete images with different surface finishes and illumination conditions. The photos were collected from various college campus buildings. The dataset is divided into two classes, negative and positive, and each type has 20,000 images with 227 x 227 pixels.

ANALYSIS: ANALYSIS: The ResNet50V2 model’s performance achieved an accuracy score of 99.83% after 5 epochs using a separate validation dataset. When we applied the model to the validation dataset, the model achieved an accuracy score of 99.75%.

CONCLUSION: In this iteration, the TensorFlow ResNet50V2 CNN model appeared suitable for modeling this dataset.

Dataset ML Model: Binary-Class classification with numerical features

Dataset Used: Özgenel, Çağlar Fırat (2019), “Concrete Crack Images for Classification”, Mendeley Data, V2, doi: 10.17632/5y9wdsg2zt.2

Dataset Reference: https://data.mendeley.com/datasets/5y9wdsg2zt/2

One source of potential performance benchmarks: https://www.kaggle.com/datasets/arunrk7/surface-crack-detection

The HTML formatted report can be found here on GitHub.

Roz Zander and Ben Zander on The Art of Possibility, Part 12

In the book, The Art of Possibility: Transforming Professional and Personal Life, Rosamund Stone Zander and Benjamin Zander show us the 12 things we can do to go on a journey of possibility, rather than living a life full of hurdles and constraints of our own making.

These are some of my favorite concepts and takeaways from reading the book.

The Twelfth Practice: Telling the WE Story

In this chapter, Roz and Ben discuss using the WE story-telling approach to create possibilities. They offer the following observations and recommendations for us to think about:

Often, history is a record of conflict between an US and a THEM. What approach can we invent that will take us from an entrenched posture of hostility to one of enthusiasm and deep regard? We should consider WE as a new entity that can personify the “togetherness” of us and others.

The WE story defines a human being in a specific way. It says we are our central selves seeking to contribute, naturally engaged, and forever in dance with each other. The WE story points to a relationship rather than to individuals. It aims to communicate patterns, gestures, and movement rather than discrete objects and identities.

By telling the WE story, we become a conduit for this inclusive entity and always inquire into what is best for US. The practice of telling the WE story points to a kind of leadership based on the courage to speak on behalf of all people for the long line of human possibility.

The steps to the WE practice are as follows:

  1. Tell the WE story – the story of the unseen threads that connect us all, the story of possibility.
  2. Listen and look for the emerging entity.
  3. Ask the questions:
    1. “What do WE want to have happen here?”
    1. “What’s the best for US?” – all of each of us, and all of us.
    1. “What’s OUR next step?”

While visions go in and out of favor, the WE story remains. The transformation from the “I” to the WE enable us to dissolve the barriers that divide us intentionally, so we may reshape our surrounding and create possibility. The practice of WE draws on all the other practices.

不是不可能

(從我一個尊敬的作家,賽斯·高汀

有些人將他們的工作建立在不可能的前沿。那些突破性的編碼,那些驚人的新魔術,與那些讓你屏息的協奏曲。

這想法是太了不起了,以至於我們很想相信這以前也是我們想做的工作。不是每隔一段時間,而是每天一次。去做以前從未做過的事情,去創造出確實稀缺的情感。

但這種工作的稀缺,可能是我們需要意識到它不適合我們的證據,至少不是今天。

今天,我們有機會去領導、聯繫和做我們引以為豪的工作。我們可以在開始之前可描述的工作,並且我們相信值得去做的工作。

這可能就夠了。

Multi-Class Tabular Classification Model for Avila Bible Identification Using Python and TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Avila Bible Identification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The Avila dataset includes 800 images extracted from the “Avila Bible,” a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain. The paleographic analysis of the manuscript has identified the presence of 12 transcribers; however, each transcriber did not transcribe the same number of pages. The prediction task is to associate each pattern to one of the 12 transcribers labeled as A, B, C, D, E, F, G, H, I, W, X, and Y. The research team normalized the data using the Z-normalization method and divided the dataset into two portions, training and test. The training set contains 10,430 samples, while the test set contains 10,437 samples.

ANALYSIS: The average performance of the preliminary TensorFlow models achieved an accuracy benchmark of 94.27%. When we processed the test dataset with the final model, the model achieved an accuracy score of 96.25%.

CONCLUSION: In this iteration, TensorFlow appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Avila Bible Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://archive-beta.ics.uci.edu/ml/datasets/avila

One source of potential performance benchmarks: https://www.sciencedirect.com/science/article/abs/pii/S0952197618300721

The HTML formatted report can be found here on GitHub.

Multi-Class Tabular Classification Model for Avila Bible Identification Using Python and XGBoost

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Avila Bible Identification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The Avila dataset includes 800 images extracted from the “Avila Bible,” a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain. The paleographic analysis of the manuscript has identified the presence of 12 transcribers; however, each transcriber did not transcribe the same number of pages. The prediction task is to associate each pattern to one of the 12 transcribers labeled as A, B, C, D, E, F, G, H, I, W, X, and Y. The research team normalized the data using the Z-normalization method and divided the dataset into two portions, training and test. The training set contains 10,430 samples, while the test set contains 10,437 samples.

ANALYSIS: The performance of the preliminary XGBoost model achieved an accuracy benchmark of 86.67%. After a series of tuning trials, the final model processed the training dataset with an accuracy score of 99.79%. When we processed the test dataset with the final model, the model achieved an accuracy score of 99.81%.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Avila Bible Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://archive-beta.ics.uci.edu/ml/datasets/avila

One source of potential performance benchmarks: https://www.sciencedirect.com/science/article/abs/pii/S0952197618300721

The HTML formatted report can be found here on GitHub.

Univariate Time Series Model for Water Utility Consumers Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a time series prediction model and document the end-to-end steps using a template. The Water Utility Consumers dataset is a univariate time series situation where we attempt to forecast future outcomes based on past data points.

INTRODUCTION: The problem is to forecast the monthly number of water utility consumers in London, United Kingdom. The dataset describes a time series of utility accounts over 11 years (1983-1994), and there are 216 observations. We used the first 80% of the observations for training and testing various models while holding back the remaining observations for validating the final model.

ANALYSIS: The baseline persistence model yielded an RMSE of 2828. The CNN model processed the same test data with an RMSE of 2395, which was better than the baseline model as expected. In an earlier ARIMA modeling experiment, the best ARIMA model with non-seasonal order of (0, 1, 1) and seasonal order of (0, 0, 1, 12) processed the validation data with an RMSE of 2260.

CONCLUSION: For this dataset, the TensorFlow CNN model achieved an acceptable result, and we should consider using TensorFlow for further modeling.

Dataset Used: Number of water consumers in London, United Kingdom, Jan 1983 through April 1994.

Dataset ML Model: Time series forecast with numerical attribute.

Dataset Reference: Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhuoranyang./tsdl/.

The HTML formatted report can be found here on GitHub.

Multi-Class Tabular Classification Model for Avila Bible Identification Using Python and TensorFlow Decision Forests

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Avila Bible Identification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The Avila dataset includes 800 images extracted from the “Avila Bible,” a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain. The paleographic analysis of the manuscript has identified the presence of 12 transcribers; however, each transcriber did not transcribe the same number of pages. The prediction task is to associate each pattern to one of the 12 transcribers labeled as A, B, C, D, E, F, G, H, I, W, X, and Y. The research team normalized the data using the Z-normalization method and divided the dataset into two portions, training and test. The training set contains 10,430 samples, while the test set contains 10,437 samples.

ANALYSIS: The performance of the preliminary Gradient Boosted Trees model achieved an accuracy benchmark of 99.99% on the training dataset. When we applied the finalized model to the test dataset, the model achieved an accuracy score of 99.87%.

CONCLUSION: In this iteration, the TensorFlow Decision Forests model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Avila Bible Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://archive-beta.ics.uci.edu/ml/datasets/avila

One source of potential performance benchmarks: https://www.sciencedirect.com/science/article/abs/pii/S0952197618300721

The HTML formatted report can be found here on GitHub.

Multi-Class Tabular Classification Model for Avila Bible Identification Using Python and Scikit-Learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Avila Bible Identification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: The Avila dataset includes 800 images extracted from the “Avila Bible,” a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain. The paleographic analysis of the manuscript has identified the presence of 12 transcribers; however, each transcriber did not transcribe the same number of pages. The prediction task is to associate each pattern to one of the 12 transcribers labeled as A, B, C, D, E, F, G, H, I, W, X, and Y. The research team normalized the data using the Z-normalization method and divided the dataset into two portions, training and test. The training set contains 10,430 samples, while the test set contains 10,437 samples.

ANALYSIS: The average performance of the machine learning algorithms achieved an accuracy benchmark of 85.51% using the training dataset. Furthermore, we selected Bagging Classifier as the final model as it processed the training dataset with a final accuracy score of 98.53%. When we processed the test dataset with the final model, the model achieved an accuracy score of 99.20%.

CONCLUSION: In this iteration, the Bagging Classifier model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Avila Bible Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://archive-beta.ics.uci.edu/ml/datasets/avila

One source of potential performance benchmarks: https://www.sciencedirect.com/science/article/abs/pii/S0952197618300721

The HTML formatted report can be found here on GitHub.