I'm trying to implement the decision tree algorithm based on the pseudo.
However, I don't understand why the first node should be outlook.
Shouldn't the gini index of Outlook be 1-(5/14)^2-(5/14)^2-(4/14)^2 = 0.663265306, and gini index of Humidity be =1-(4/14)^2-(6/14)^2-(4/14)^2 = 0.653061224?
As the gini index represents the impurity of the attribute, it is more reasonable to choose the attribute with the lower gini index.
Is my way of finding a gini index is wrong or there is something else I should know?
Data
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No