The online calculator below parses the set of training examples, then builds a decision tree, using Information Gain as the criterion of a split. If you are unsure what it is all about, read the short explanatory text on decision trees below the calculator.
Note: Training examples should be entered as a csv list, with a semicolon used as a separator. The first row is considered to be a row of labels, starting from attributes/features labels, then the class label. All the other rows are examples. The default data in this calculator is the famous example of the data for the "Play Tennis" decision tree
A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.1
Let's look at the calculator's default data.
Attributes to be analyzed are:
- Outlook: Sunny/Overcast/Rain
- Humidity: High/Normal
- Wind: True/False
- Temperature: Hot/Mild/Cool
Class label is:
- Play: Yes/No
So, by analyzing the attributes one by one, the algorithm should effectively answer the question: "Should we play tennis?" Thus, in order to perform as few steps as possible, we need to choose the best decision attribute on each step – the one that gives us the maximum information. This attribute is used as the first split. Then the process continues until we have no need to split anymore (after the split all the remaining samples are homogeneous, in other words, we can identify the class label), or there are no more attributes to split.
The generated decision tree first splits on "Outlook". If the answer is "Sunny", then it checks the "Humidity" attribute. If the answer is "High", then it is "No" for "Play". If the answer is "Normal", then it is "Yes" to "Play". If the "Outlook" is "Overcast", then it is "Yes" to "Play" immediately. If the "Outlook" is "Rainy", then it needs to check the "Windy" attribute. Note that this decision tree does not need to check the "Temperature" feature at all!
You can use different metrics as split criterion, for example, Entropy (via Information Gain or Gain Ratio), Gini Index, Classification Error. This particular calculator uses Information Gain.
You might wonder why we need a decision tree if we can just provide the decision for each combination of attributes. Of course you can, but even for this small example, the total number of combinations is 3*2*2*3=36. From the other side, we have just used a subset of combinations (14 examples) to train our algorithm (by building a decision tree) and now it can classify all other combinations without our help. That's the point of machine learning. Of course, there are many implications regarding non-robustness, overfitting, biasing, etc. For more information you may want to check this Decision tree learning article on Wikipedia.