0
votes

i have a question about Porter Stemmer Algorithm, I researched on the internet,

but i couldn't find what the difference between understemming and overstemming.

and is the Porter Algorithm understemming or overstamming?

do you have an idea?

Thanks in advance

1

1 Answers

1
votes

Overstemming happens when the cut-off suffix is too long, this leads to spurious matching of unrelated words.

Understemming is the opposite -- e.g. a stemmer that doesn't cut off anything inherently understems.

Porter Stemmer, I suspect, will do both types of errors from time to time, for English. Note that implementations for other languages might behave very differently (speaking about Snowball which has user-supplied algorithms for a bunch of languages). They may even differ in the linguistic definition of stem.