3
votes

I'm trying to index some old documents for searching -- 16th, 17th, 18th century.

Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh.

Are there stemmers that specialize in the English from the time of Shakespeare and the King James Bible? I'm currently using solr.PorterStemFilterFactory.

1
Do you know any online available dictionary files that work with that old English?cheffe
The DOA reads good, but is work in progress (as of 2015).cheffe
@cheffe I do not, but suppose I did. Is there a way to make a stemmer out of a dictionary file with Solr/Lucene?Eric Wilson
Yes, for Solr the HunspellStemFilterFactory or if you are using Lucene the HunspellStemFilter itself.cheffe

1 Answers

1
votes

It looks like the rule changes are minimal for that.

So, it might be possible to copy/modify the PorterStemmer class and related Factories/Filters.

Or it might be possible to add those specific rules as Regular expression filter before Porter.