Informed Decisions: The Information Theory of Decision Trees





Decisions, decisions

There are multiple ways of constructing a decision tree. The C4.5 algorithm, an improved version of Ross Quinlan's ID3 algorithm, is widely used in the community. A beginner might use Scikitlearn's decision tree classifier on the Titanic dataset; it gives you the option of using the default (Gini Impurity) or Entropy for constructing splits along the decision tree.

Why this talk?

We aim to demystify Scikitlearn's decision tree classifier by discussing (broadly) the different approaches to decision trees, namely, the CART approach involving Gini Impurity, and the ID3 approach using Entropy to make splits.

We'll talk about the metrics that make a decision tree more suitable to a broad use-case, as well as those that make improvements clearer to the user (smallness of the tree, speed, boosting capability, memory usage, winnowing).

Lastly, we will briefly discuss the usability of online-learning trees or incremental trees, such as those based on the VFDT (Very Fast Decision Tree) algorithm, again going back to previously discussed metrics of 'goodness' (boosting, pruning, speed), as well as the concept of dataset shift in decision trees.

Key Takeaways:

  • The Information Theory of Decision Trees
  • CART based functions for decision trees -- how they measure up to ID3 and its successors
  • Metrics of measurement of decision trees
  • Methods of getting better decision trees
  • A brief overview of online learning trees

An outline of the talk:

  • A precursor of what information theory is and what decision theory is (1 min)
  • A brief overview of important information theory concepts: entropy, information gain, cross-entropy, as well as introducing the concept of Gini Impurity (3 min)
  • Taking a look at the entropy and information gain of the data before the splits using the Pyitlib for information theory while differentiating between ID3 and CART (using Gini impurity) using the Scikitlearn decision tree classifier (10 min)
  • Comparing the C4.5 and C5.0 to Scikitlearn's Decision Tree Regressor (3 min)
  • What makes a tree beautiful: pruning, boosting, winnowing, and speed (5 min)
  • Introducing dataset shift, and incremental approach to trees with VFDT — Very Fast Decision Tree (3 min)


  • Understanding probability (especially conditional probability)
  • Understanding the basic concepts of entropy, information gain, gini impurity
  • Basics of how decision trees work, and their uses

Video URL:

Speaker Info:

Purva Bhalekar and Ruchi Pendse are fourth-year Electronics and Telecommunications Engineering students at Cummins College of Engineering for Women, Pune. Interested in telecommunication and information theory, their fourth-year project is related to rotation trees involving taking an information-theoretic approach towards PCA and decision trees.

Previously, Ruchi has worked on sentiment analysis and in quantum mechanics, developing an understanding of quantum ontology, and quantum information science, which she writes about on her personal blog.

Purva is interested in web development and machine learning and is looking forward to working as an ML engineer.

Speaker Links:

Connect with us on LinkedIn:

Purva's LinkedIn

Ruchi's LinkedIn




Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Beginner
Last Updated: