1
votes

So I've been trying to add nGram matching to my ElasticSearch index but I'm running into the following problem.

Doing a standard string query only returns exact matches. Running a match query on the specific test field produces the nGram matches like expected.

I setup nGram filters and analyzers for my fields based on these(1) examples(2). The mapping code is below:

tire.settings :number_of_shards => 1,
      :number_of_replicas => 1,
      :analysis => {
        :analyzer => {
          "str_search_analyzer" => {
            "tokenizer" => "keyword",
            "filter" => "lowercase"
          },
          "str_index_analyzer" => {
            "tokenizer" => "keyword",
            "filter" => ["lowercase","substring"]
          }
        },
        :filter => {
          :substring => {
            "type" => "nGram",
            "min_gram" => 1,
            "max_gram" => 10
          }
        }
      } do
      mapping do
        indexes :test, :type=>'string',
                :search_analyzer => :str_search_analyzer,
                :index_analyzer=>:str_index_analyzer
      end
    end

  def to_indexed_json
    #adding known word plus random string for testing
    { 
      :test => "pizza" + (0...10).map{ ('a'..'z').to_a[rand(26)] }.join
    }.to_json
  end

ElasticSearch Queries

The query that produces results:

curl -X GET "http://localhost:9200/users/_search?pretty=true" -d '{"query":{"text":{"test":{"query":"piz"}}}}'

The query that produces NO results:

curl -X GET "http://localhost:9200/users/_search?pretty=true" -d '{"query":{"query_string":{"query":"pizz"}}}'

Is there any way to get a general query_string search to look through all indexed fields and match ngrams, rather than having to do a text/match search on a specific column?

1

1 Answers

4
votes

This is the expected behaviour. By default, the "query_string" query is executed on the "_all" field. And since this field is indexed using the StandardAnalyzer, the indexed tokens for it will be different than the ones for the "test" field (which you configured to use the nGram analyzer).

You can go about changing this behaviour in several ways:

  1. Change the mappings in the index settings and configure the nGram analyzer for the "_all" field
  2. Send and "_analyzer" field as part of the document (it will be picked up and used for all fields that don't have explicit analyzer configured for them)
  3. Specify what fields you'd like the "query_string" to be executed on using the "fields" attribute

From all three options above, #3 is most recommended. Explicitly specifying the fields gives you much more control over the data (how it's indexed and queried).