Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Planet: Understanding the Amazon from Space dataset is a multi-label classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: Planet, designer and builder of the world’s largest constellation of Earth-imaging satellites collaborated with its Brazilian partner SCCON in challenging Kaggle participants to label satellite image chips with atmospheric conditions and various classes of land cover/land use. The resulting models will help the global community better understand deforestation conditions and how to respond to them.
The purpose of this modeling exercise is to construct an end-to-end template for solving multi-label machine learning problems. The series of scripting exercises will replicate Dr. Jason Brownlee’s blog post on this topic to build a robust template for future similar problems.
From iteration Take1, we constructed the necessary script segments to download and pre-process the image files available on Kaggle’s website.
From iteration Take2, we constructed the necessary script segments to train the TensorFlow model and evaluated the model’s effectiveness.
From iteration Take3, we constructed the necessary script segments to load an unseen image and perform prediction on the image.
From iteration Take4, we constructed the necessary script segments to load a list of test images from Kaggle and perform prediction on these images.
From iteration Take5, we constructed a VGG-5 network with dropout parameters and performed prediction on Kaggle’s test images.
In this Take6 iteration, we will incorporate image augmentation with a VGG-5 network and perform prediction on Kaggle’s test images.
ANALYSIS: From iteration Take1, we successfully downloaded and pre-processed the image files from Kaggle.
From iteration Take2, the performance of the baseline model achieved a fbeta score of 0.8478 after 20 epochs using the validation dataset.
From iteration Take3, we successfully downloaded an image and made a prediction on the previously unseen photo.
From iteration Take4, we successfully processed the test images downloaded from Kaggle and made predictions on the previously unseen photos. We obtained a fbeta score of 0.6097 from our predictions.
From iteration Take5, we built a VGG-5 network with dropout parameters and the test it with the images downloaded from Kaggle. We obtained a fbeta score of 0.6463 from our predictions.
In this Take6 iteration, we successfully incorporated image augmentation with a VGG-5 network and the test it with the images downloaded from Kaggle. We obtained a fbeta score of 0.4528 from our predictions.
CONCLUSION: In this iteration, the TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: Planet: Understanding the Amazon from Space
Dataset ML Model: Multi-label classification with numerical attributes
Dataset Reference: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
One potential source of performance benchmarks: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/leaderboard
The HTML formatted report can be found here on GitHub.