Binary-Class Model for Heart Disease Key Indicators Using Scikit-learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: This dataset comes from the CDC’s Behavioral Risk Factor Surveillance System (BRFSS) study, which conducts annual telephone surveys to gather data on the health status of U.S. residents. The original dataset consists of 401,958 rows and 279 columns. However, the Kaggle project owner selected some of the most relevant attributes from the dataset and cleaned it up for machine learning projects.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC-AUC benchmark of 86.24% using the training dataset. Furthermore, we selected Random Forest as the final model as it processed the training dataset with a final ROC-AUC score of 91.28%. When we processed the test dataset with the final model, the model achieved a ROC-AUC score of 70.94%.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Personal Key Indicators of Heart Disease Dataset

Dataset ML Model: Binary classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease

One source of potential performance benchmarks: https://www.kaggle.com/kamilpytlak/personal-key-indicators-of-heart-disease/code

The HTML formatted report can be found here on GitHub.

Seth Godin on Survival Is Not Enough, Part 14

In his book, Survival Is Not Enough: Why Smart Companies Abandon Worry and Embrace Change, Seth Godin discusses how innovative organizations and individuals can apply prudent strategies in adapting and positioning themselves for the constant changes.

These are some of my favorite concepts and takeaways from reading the book.

The Important Questions (even more questions)

In this chapter, Seth discusses some of the crucial questions we should be asking ourselves as we set out to implement the concepts of evolving and zooming. He offers the following observations and recommendations for us to think about:

What are we measuring?

Whatever gets measured is what will get done. A fast feedback loop cannot work unless we are measuring something we can change. Being specific about our measurement is a crucial first step in evolving.

Have we institutionalized the process of sharing what we learn?

Learning that does not get passed on does not do us any good. For example, if our organization invests in farming and hunting, the effort is wasted unless we keep track and teach each other what is being learned.

What do we need to do to become the first choice?

Can we create a winning strategy in which our organization wins even if our employees are not the best? Can we formulate a winning plan even if our personal mDNA is not the best personally?

Are we investing in techniques that encourage fast memetic evolution?

  • Invest in exploring to find the memes most likely to give us success.
    • Invest resources taking care of the people who carry the best set of memes.
    • Create lots of memes and drop the ones that do not work.
    • Recognize that monogamy is ineffective.
    • Use fast feedback loops.
    • Keep overhead small by investing as little as possible in creating new memes.
    • Do not spend a lot of resources supporting the memes that do not make our organization more fit.
    • Swap memes with others.
    • Depend on recombination more than mutation.
    • Invest in the memes that are worth spreading.

類似於介形蟲

(從我一個尊敬的作家,賽斯·高汀

介形蟲滅絕了。 數百萬年來,每一步驟都有充分的理由,它進化成它本來的樣子。

當我們將所有這些小步驟加起來時,我們最終會得到一個不再適合其環境的生物。

組織也是這樣發展的。 工作實踐、文化體系和“我們在這裡做事的方式”就是如此。

我敢肯定,二十年前你現在所做的事情,所涉及的所有步驟,都是有充分理由的,但是你目前的競爭對手,那些從頭開始的人,正在跳過這大部分你現在所走步驟。

每天我們都有新的機會能重新開始。 如果你不這樣做,其他人會。

[更新! 我向所有介形蟲的粉絲道歉。 雖然有些類型已經消失,但它並沒有完全滅絕。 這使我感到很高興,即使用這個比喻不太好。]

Multi-Class Model for Fetal Health Classification Using AutoKeras

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains 2,126 records of features extracted from Cardiotocography exams. Cardiotocograms are a simple and cost accessible option to assess fetal health. The equipment works by sending ultrasound pulses and reading its response, thus highlighting fetal heart rate, fetal movements, uterine contractions, and more. Three expert obstetricians classified the outcomes into three classes: Normal, Suspect, and Pathological.

ANALYSIS: After a series of tuning trials, the best AutoKeras model processed the training dataset with an accuracy score of 93.30%. When we processed the test dataset with the final model, the model achieved an accuracy score of 88.08%.

CONCLUSION: In this iteration, the AutoKeras model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Fetal Health Classification Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/andrewmvd/fetal-health-classification

One source of potential performance benchmarks: https://www.kaggle.com/andrewmvd/fetal-health-classification/code

The HTML formatted report can be found here on GitHub.

Multi-Class Model for Fetal Health Classification Using TensorFlow

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains 2,126 records of features extracted from Cardiotocography exams. Cardiotocograms are a simple and cost accessible option to assess fetal health. The equipment works by sending ultrasound pulses and reading its response, thus highlighting fetal heart rate, fetal movements, uterine contractions, and more. Three expert obstetricians classified the outcomes into three classes: Normal, Suspect, and Pathological.

ANALYSIS: The average performance of the preliminary TensorFlow models achieved an accuracy benchmark of 92.03%. When we processed the test dataset with the final model, the model achieved an accuracy score of 91.84%.

CONCLUSION: In this iteration, TensorFlow appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Fetal Health Classification Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/andrewmvd/fetal-health-classification

One source of potential performance benchmarks: https://www.kaggle.com/andrewmvd/fetal-health-classification/code

The HTML formatted report can be found here on GitHub.

Univariate Time Series CNN/LSTM Combo Modeling Template Using TensorFlow Version 1

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that can be used to support time series analysis using the TensorFlow framework and Python.

Version 1 of the TensorFlow CNN and LSTM combo time series template replicates Dr. Brownlee’s blog post “Deep Learning Models for Univariate Time Series Forecasting”. I plan to build a script for modeling future projects by adapting the example workflow presented in the blog.

You will find the Python time series template on the Analytics Project Templates page.

Multi-Class Model for Fetal Health Classification Using XGBoost

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains 2,126 records of features extracted from Cardiotocography exams. Cardiotocograms are a simple and cost accessible option to assess fetal health. The equipment works by sending ultrasound pulses and reading its response, thus highlighting fetal heart rate, fetal movements, uterine contractions, and more. Three expert obstetricians classified the outcomes into three classes: Normal, Suspect, and Pathological.

ANALYSIS: The performance of the preliminary XGBoost model achieved an accuracy benchmark of 94.68%. After a series of tuning trials, the final model processed the training dataset with an accuracy score of 95.24%. When we processed the test dataset with the final model, the model achieved an accuracy score of 94.04%.

CONCLUSION: In this iteration, XGBoost appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Fetal Health Classification Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/andrewmvd/fetal-health-classification

One source of potential performance benchmarks: https://www.kaggle.com/andrewmvd/fetal-health-classification/code

The HTML formatted report can be found here on GitHub.

Multi-Class Model for Fetal Health Classification Using TF Decision Forests

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Fetal Health Classification dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: This dataset contains 2,126 records of features extracted from Cardiotocography exams. Cardiotocograms are a simple and cost accessible option to assess fetal health. The equipment works by sending ultrasound pulses and reading its response, thus highlighting fetal heart rate, fetal movements, uterine contractions, and more. Three expert obstetricians classified the outcomes into three classes: Normal, Suspect, and Pathological.

ANALYSIS: The performance of the preliminary Gradient Boosted Trees model achieved an accuracy benchmark of 99.50% on the training dataset. When we applied the finalized model to the test dataset, the model achieved an accuracy score of 92.79%.

CONCLUSION: In this iteration, the TensorFlow Decision Forests model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Fetal Health Classification Dataset

Dataset ML Model: Multi-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/andrewmvd/fetal-health-classification

One source of potential performance benchmarks: https://www.kaggle.com/andrewmvd/fetal-health-classification/code

The HTML formatted report can be found here on GitHub.