1
votes

I am following a book "Machine Learning: Hands-On for Developers and Technical Professionals" to create decision tree with WEKA. Though I followed the same process as shown in the book, I am not getting the same decision tree. I am using C4.5 (J48) algorithm.

Data (arff file)

@relation ladygaga

@attribute placement {end_rack, cd_spec, std_rack}
@attribute prominence numeric
@attribute pricing numeric
@attribute eye_level {TRUE, FALSE}
@attribute customer_purchase {yes, no}

@data
end_rack,85,85,FALSE,yes
end_rack,80,90,TRUE,yes
cd_spec,83,86,FALSE,no
std_rack,70,96,FALSE,no
std_rack,68,80,FALSE,no
std_rack,65,70,TRUE,yes
cd_spec,64,65,TRUE,yes
end_rack,72,95,FALSE,yes
end_rack,69,70,FALSE,no
std_rack,75,80,FALSE,no
end_rack,75,70,TRUE,no
cd_spec,72,90,TRUE,no
cd_spec,81,75,FALSE,yes
std_rack,71,91,TRUE,yes

Expected Output Expected Output

My Output My Output

What am I doing wrong?

1

1 Answers

3
votes

It is a problem with the book (Keeping the answer over here so that it can help other readers of the book).

Book expects only one negative case in the end_rack category (Look for (5,1) in author's tree diagram). In data provided in the book and even on the book website, there are actually two negative cases (5,2). I removed one negative case and got the same decision tree as the book.

Here is the corrected data arff file

@relation ladygaga

@attribute placement {end_rack, cd_spec, std_rack}
@attribute prominence numeric
@attribute pricing numeric
@attribute eye_level {TRUE, FALSE}
@attribute customer_purchase {yes, no}

@data
end_rack,85,85,FALSE,yes
end_rack,80,90,TRUE,yes
cd_spec,83,86,FALSE,no
std_rack,70,96,FALSE,no
std_rack,68,80,FALSE,no
std_rack,65,70,TRUE,yes
cd_spec,64,65,TRUE,yes
end_rack,72,95,FALSE,yes
end_rack,69,70,FALSE,yes
std_rack,75,80,FALSE,no
end_rack,75,70,TRUE,no
cd_spec,72,90,TRUE,no
cd_spec,81,75,FALSE,yes
std_rack,71,91,TRUE,yes

Correct Output Decision tree from corrected data