modify weka stemmer for persian text

Question

I want to use weka for text classification for Persian text. But I have a problem.

Tokenizer, stoplist and stemmer in Persian is different from these in English. So I should use my stemmer, tokenizer and stoplist in weka's interface there is a soulution to use my own stoplist but there is no way to change stemmer and tokennizer.

I want to know is there anyway to change them without modify weka's source code?

Because I am new in java and I don't know how I should modify weka source code.

MSepehr MSepehr · Accepted Answer · 2014-01-11T20:48:54

i find my answer!it's impossible do it without modify weka's source code i forced to modify weka's source code.i had so much trouble to do it .because i am new in java!and so i put a brief steps to modifying weka's code to help others : first you should set java environment variable that described in this link: http://www.ntu.edu.sg/home/ehchua/programming/howto/Environment_Variables.html and then instal ant that described in this links : http://ant.apache.org/bindownload.cgi and finally see this video to find out how should you modify weka 's code: http://www.youtube.com/watch?v=buCpG7uV_v4

modify weka stemmer for persian text

1 Answers