I'm reading Machine Learning In Action and am going through the decision tree chapter. I understand that decision trees are built such that splitting the data set gives you a way to structure your branches and leafs. This gives you more likely information at the top of the tree and limits how many decisions you need to go through.
The book shows a function determining the shannon entropy of a data set:
def calcShannonEnt(dataSet):
numEntries = len(dataSet)
labelCounts = {}
for featVec in dataSet: #the the number of unique elements and their occurance
currentLabel = featVec[-1]
if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0
labelCounts[currentLabel] += 1
shannonEnt = 0.0
for key in labelCounts:
prob = float(labelCounts[key])/numEntries
shannonEnt -= prob * log(prob,2) #log base 2
return shannonEnt
Where the input data set is an array of arrays where each array represents a potential classifiable feature:
dataSet = [[1, 1, 'yes'],
[1, 1, 'yes'],
[1, 0, 'no'],
[0, 1, 'no'],
[0, 1, 'no']]
What I don't get is why the shannon entropy function in this book is only ever looking at the last element in the feature array? It looks like its only calculating the entropy for "yes" or "no" items, and not the entropy of any of the other features?

It doesn't make sense to me because the entropy for this data set
dataSet = [[1, 1, 'yes'],
[1, 'asdfasdf', 'yes'],
[1900, 0, 'no'],
[0, 1, 'no'],
['ddd', 1, 'no']]
Is the same as the entropy above, even though it has a lot more diverse data.
Shouldn't the other feature elements be counted as well in order to give the total entropy of the data set, or am I misunderstanding what the entropy calculation is supposed to do?
If anyone is curious, the full source (which is where this code came from) for the book is here under the Chapter03 folder.