Classification: Decision Trees

EO4GEO Lecture

Classification: Decision Trees 2022-01-05T17:14:21+00:00

Product Description

D ecision Trees (DT) is a non-parametric classifier that gained popularity in different domains because its structure is explicit and easily interpretable.

A DT is built by recursively splitting each tree node using a statistical procedure such as Gini impurity measure, information gain (for classification scenarios), or variance (for prediction problems).

In this lecture, we will be introducing the main steps required for building a DT. Two procedures used to select the best variables for splitting the tree nodes will be discussed by making use of a practical example. These procedures include Gini impurity and information gain. The overfitting and underfitting concepts will be explained in the second part of the lecture when we will introduce two solutions to build the optimal DT, i.e. a DT that does not overfit: (1) stop growing the DT early before overfitting and (2)pruning or reducing the size of the tree.

The lecture ends up by listing the main advantages and disadvantages of DT.

Learning outcomes

  • Explain how information gain and Gini impurity is calculated.

  • Present the main advantages and disadvantages of decision trees classifier.

  • Describe the concept of over-fitting and under-fitting.

  • Define the main solutions that can be applied to avoid decision trees over-fitting.

BoK concepts

Links to concepts from the EO4GEO Body of Knowledge used in this course:

Material preview

Ownership

Designed and developed by: Mariana Belgiu, University of Twente.

License: Creative Commons Attribution-ShareAlike.

Education level

Language

English

Creation date

2020-06-20

Access

Find below a direct link to the HTML presentation.

access the presentation

Find below a link to the GitHub repository where you can download the presentation.

download the presentation