Loading Text dataset into python weka wrapper

Question

I've installed weka python wrapper on Windows 7. And I tried running the sample code:

import weka.core.jvm as jvm
jvm.start()

data_dir = "E:/Files/Fourth/"

from weka.core.converters import Loader
loader = Loader("weka.core.converters.TextDirectoryLoader")
datasets = [
  data_dir + "File 1",
  data_dir + "File 2",
  data_dir + "File 3",
  data_dir + "File 4",
  data_dir + "File 5"

 ]
data = loader.load_file(datasets)
data.delete_last_attribute()
print(data)

and I receive the following error:

Traceback (most recent call last):
File "C:/Python27/weekaa.py", line 16, in <module>
data = loader.load_file(datasets)
File "C:\Python27\lib\site-packages\weka\core\converters.py", line 67, 
in load_file
self.enforce_type(self.jobject,   
"weka.core.converters.FileSourcedConverter")
File "C:\Python27\lib\site-packages\weka\core\classes.py", line 155, 
in  enforce_type
raise TypeError("Object does not implement or subclass " + 
intf_or_class  + "!")
TypeError: Object does not implement or 
subclass  weka.core.converters.FileSourcedConverter!

I tried solution in previous asked question by adding class path to weka.jar or python-weka-wrapper but didn't work. The error doesn't appear when loading .arff file type.

Is there a solution to load text files?

Note: each file in dataset has set of text document files(for later clustering)

ANjell ANjell · Accepted Answer · 2015-03-18T21:01:17

The TextDirectoryLoader cannot be used with the currently released versions of python-weka-wrapper, since it operates differently to all. Now after the updating (https://groups.google.com/forum/#!topic/python-weka-wrapper/hgfFMnEIKZg) the TextDirectoryLoader class has been add to python weka wrapper and can be used as following :

from weka.core.converters import TextDirectoryLoader 
text_dir = "/the/directory/you/want/to/load" 
loader = TextDirectoryLoader(options=["-dir", text_dir, "-F","-charset", "UTF-8"]) 
data = loader.load() 
print(unicode(data))

Be sure you have the updated package of python weka wrapper, can download it from

[ http://github.com/fracpete/python-weka-wrapper]

and install from source : python setup.py install

Loading Text dataset into python weka wrapper

1 Answers