1
votes

I am performing some basic search functions using ElasticSearch and Tire but the basic configuration of the snowball stemming analyzer has me stumped. I'm pretty much following the code example from the GitHub page: https://github.com/karmi/tire

Here's a Ruby sample file (Ruby 1.9.3, Tire 1.8.25):

require 'tire'

Tire.index 'videos' do
  delete
  create :mappings => {
  :video => {
      :properties => {
        :code                => { :type => 'string' },
        :description         => { :type => 'string', :analyzer => 'snowball' }
      }
  }
}
end

videos = [
    { :code => '1', :description => "some fight video" },
    { :code => '2', :description => "a fighting video" }
]

Tire.index 'videos' do
    import videos
    refresh
end

s = Tire.search 'videos' do
   query do
      string 'description:fight'
   end
end

s.results.each do |document|
   puts "* #{document.code} - #{document.description}"
end

I would have expected this to yield both records in the matches because fight and fighting have the same stem. However, it only returns the first record:

* 1 - some fight video

This would indicate that the default analyzer is being used rather than the one I'm configuring.

I am aware of passing the actual field in the query string per this question (ElasticSearch mapping doesn't work) and have successfully run this code so my ElasticSearch installation seems fine.

What do I need to change for Tire to return both records for this query (ie how do I get stemming working here)?

2

2 Answers

0
votes

I would have expected this to yield both records in the matches because fight and fighting have the same stem. However, it only returns the first record:

right. 'fight' stems to 'fight' and returns the result with only "fight" in it. Fighting will do exactly the same thing, unless you set up your search index to match otherwise.

If you want it to behave the way you describe, you'll probably want to make your default index use an edge ngram analyzer so that "fight" will also match "fighting" and return it. This will also have, what I think is the desirable effect, of matching both 'fight' and 'fighting' if you query for "fighting" too.

0
votes

Well, it turned out that it was a pretty simple error on my part. I neglected to include the "type" in the hash defining the videos. Replacing

videos = [
    { :code => '1', :description => "some fight video" },
    { :code => '2', :description => "a fighting video" }
]

with

videos = [
    { :type => 'video', :code => '1', :description => "some fight video" },
    { :type => 'video', :code => '2', :description => "a fighting video" }
]

fixed the problem.

The effect of the change in code was to apply the correct analyzer to the description field. Previously, the snowball analyzer would only be applied to the search query which would result in the search query being stemmed. If I entered "description:fighting" in the query statement it would still match the first result - "some fight video" rather than the "a fighting video" match. This tipped me off that the records weren't being analyzed correctly.