0
votes

I am trying to use weka to analyze some data. I've got a dataset with 3 variables and 1000+ instances.

The dataset references movie remakes and

  • how similar they are (0.0-1.0)
  • the difference in years between the movie and the remake
  • and lastly if they were made by the same studio (yes or no)

I am trying to make a decision tree to analyze the data. Using the J48 (because that's all I have ever used) I only get one leaf. Im assuming I'm doing something wrong. Any help is appreciated.

Here is a snippet from the data set:

Similarity  YearDifference  STUDIO TYPE
    0.5         36              No
    0.5         9               No
    0.85        18              No
    0.4         10              No
    0.5         15              No
    0.7         6               No
    0.8         11              No
    0.8         0               Yes
    ...

If interested the data can be downloaded as a csv here http://s000.tinyupload.com/?file_id=77863432352576044943

1
Always include all relevant information in your post, the sites you link to can go down or unavailable and future visitors won't know what you meant. That being said: How are you training the J48? Command-line or Java code? What options did you use? How do you know it's only one leaf? Help us to help you ;)Sentry
Sorry about that. I'll make sure to include a snippet like you did for future posts. Im not sure exactly what you mean by training the j48. In weka all I was taught to do is apply the classification to the dataset. I know it's only one leaf because when I visualize the tree it shows only one leaf and in the analysis it states only one leaf. Hope that helps!user2980869

1 Answers

0
votes

Your data set is not balanced cause there are almost 5 times more "No" then "Yes" for class attribute. That's why J48 is tree which is actually just one leaf that classifies everything as "NO". You can do one of these things:

  1. sample your data set so you have equal number of No and Yes
  2. Try using better classification algorithm e.g. Random Forest (it's located few spaces below J48 in Weka explorer GUI)