Home Data Dr.Brandon explains Decision Trees to Jon

Dr.Brandon explains Decision Trees to Jon

November 8, 2017 - 12:00 am

2596

3 min read

[box type=”shadow” align=”” class=”” width=””]Dr. Brandon: Hello and welcome to the third episode of ‘Date with Data Science’. Today we talk about decision trees in machine learning.

Jon: Decisions are hard enough to make. Now you want me to grow a decision tree. Next, you’ll say there are decision jungles too!

Dr. Brandon: It might come as a surprise to you, Jon, but decision trees can help you make decisions easier.

Imagine you are in a restaurant and you are given a menu card. A decision tree can help you decide if you want to have a burger, pizza, fries or a pie, for instance. And yes, there are decision jungles, but they are called random forests. We will talk about them another time.

Jon: You know Bran, I have never been very good at making decisions. But with food, it is easy. It’s ALWAYS all you can have.

Dr. Brandon: Well, my mistake. Let’s take another example. You go to the doctor’s after your binge eating at the restaurant with stomach complaints. A decision tree can help your doctor decide if you have a problem and then to choose a treatment option based on what your symptoms are.

Jon: Really!? Tell me more.

Dr. Brandon: Alright. The following excerpt introduces decision trees from the book Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, and Shuen Mei. To know how to implement them in Spark read this article. [/box]

Decision trees are one of the oldest and more widely used methods of machine learning in commerce. What makes them popular is not only their ability to deal with more complex partitioning and segmentation (they are more flexible than linear models) but also their ability to explain how we arrived at a solution and as to “why” the outcome is predicated or classified as a class/label.

A quick way to think about the decision tree algorithm is as a smart partitioning algorithm that tries to minimize a loss function (for example, L2 or least square) as it partitions the ranges to come up with a segmented space which are best-fitted decision boundaries to the data. The algorithm gets more sophisticated through the application of sampling the data and trying a combination of features to assemble a more complex ensemble model in which each learner (partial sample or feature combination) gets to vote toward the final outcome.

The following figure depicts a simplified version in which a simple binary tree (stumping) is trained to classify the data into segments belonging to two different colors (for example, healthy patient/sick patient). The figure depicts a simple algorithm that just breaks the x/y feature space to one-half every time it establishes a decision boundary (hence classifying) while minimizing the number of errors (for example, a L2 least square measure):

The following figure provides a corresponding tree so we can visualize the algorithm (in this case, a simple divide and conquer) against the proposed segmentation space. What makes decision tree algorithms popular is their ability to show their classification result in a language that can easily be communicated to a business user without much math:

If you liked the above excerpt, please be sure to check out Apache Spark 2.0 Machine Learning Cookbook it is originally from to learn how to implement deep learning using Spark and many more useful techniques on implementing machine learning solutions with the MLlib library in Apache Spark 2.0.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Dr.Brandon explains Decision Trees to Jon

LEAVE A REPLY Cancel reply

Must Read in Cloud & Networking

Top life hacks for prepping for your IT certification exam

Learning Essential Linux Commands for Navigating the Shell Effectively

ServiceNow Partners with IBM on AIOps from DevOps.com

Must Read in Data

Learn Transformers for Natural Language Processing with Denis Rothman

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Distributed training in TensorFlow 2.x

Interviews

Learn Transformers for Natural Language Processing with Denis Rothman

Clean Coding in Python with Mariano Anaya

Bringing AI to the B2B world: Catching up with Sidetrade CTO Mark Sheldon [Interview]

On Adobe InDesign 2020, graphic designing industry direction and more: Iman Ahmed, an Adobe Certified Partner and Instructor [Interview]

Is DevOps experiencing an identity crisis? [Interview]

MobilePro

datapro

Programming

Subscribe to our newsletter