2
votes

I am using weka tool to try to generate a set of classification rules from a dataset. The dataset is currently a .txt file of the form:

webpage attr1 attr2 attr3.....attrn type
try.html  1     2    3.....

(with each word seperated by a tab)

How do I convert this to a suitable input file for weka? I tried converting it to csv and then to arff format, but it doesn't work and keeps giving me one of 2 errors header stream is invalid or attribute names are not unique.

1
I've the same problem - did you solve it?Marcin Erbel

1 Answers

2
votes

An ARFF file have the following format:

@RELATION aNameForTheRelation

@ATTRIBUTE attr_0 TYPE
@ATTRIBUTE attr_1 TYPE
% ... (this' a comment)
@ATTRIBUTE attr_N TYPE

@DATA
sample_0_attr_0_v,sample_0_attr_1_v,...,sample_0_attr_N_v
sample_1_attr_1_v,sample_1_attr_1_v,...,sample_1_attr_N_v
% ...
sample_M_attr_1_v,sample_M_attr_1_v,...,sample_M_attr_N_v

It can be basically a CSV file with a header. Did you try to manually write the header of the ARFF file and append the CSV file information? Maybe the automate tool failed in detecting proper naming for the attributes in the resulting ARFF