0
votes

I am using Weka to classify a data set. The .arrf data file looks like this. The problem I am facing while classifying is that A lot of classifiers like NaiveBayes etc. won't take the string attribute to classify it. Those two string attributes are important features for classification. I tried converting the string to nominal type using the filter but it doesn't convert it to nominal. How should I go about it considering the dataset I have ?

@RELATION transaction

@ATTRIBUTE transactionType  {'CC Credit',Trans,Exp,Dep,Check}
@ATTRIBUTE number numeric
@ATTRIBUTE posting {Yes,No}
@ATTRIBUTE String1 string
@ATTRIBUTE String2 string


@ATTRIBUTE amount real
@ATTRIBUTE class {1,2}


@DATA
'CC Credit',?,Yes,'XYZ Bank','ONLINE PYMT Aug',-1582100.38,1   
Trans,?,Yes,?,'ACH DEBIT XYZ CREDIT CRD-EPAY',-59219.40,2   
Exp,?,Yes,'First Nolastname','ACH DEBIT First Nolastname-RECEIVER',-176011.56,2   
2
How did you generate the dataset/arff file? - Alerra
I manually created it using sample files as reference. - Anshul Tripathi

2 Answers

3
votes

You do not say what interface you are using. I assume that you are using the GUI.

On the "Preprocess" tab, under "Filters" select

filters -> unsupervised -> attribute

scroll down to find StringToNominal . By default, it will just convert the last attribute. You will want to change it to convert all of your strings.

Screenshot of GUI

Just in case:

If you are using R and RWeka, you can get this filter by running

Str2Nom = make_Weka_filter("weka/filters/unsupervised/attribute/StringToNominal")
Str2Nom(transactionType ~ ., data=Transaction, control=Weka_control(R=4:5))
0
votes

G5W's answer should work, but if you are constructing the ARFF file yourself then another option is to define these attributes as nominal ones in the ARFF file, in the same way that you already have done for the transaction type and posting attributes.

To manually construct the list of nominal values which goes between the { and } in the @ATTRIBUTE line, you could for example use the Data > Remove Duplicates function in Excel.