1
votes

I've read a lot and it seems that using EdgeNGrams is a good way to go for implementing an autocomplete feature for search applications. I've already configured the EdgeNGrams in my settings for my index.

PUT /bigtestindex
{
  "settings":{
    "analysis":{
      "analyzer":{
        "autocomplete":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":[ "standard", "stop", "kstem", "ngram" ] 
        }
      },
      "filter":{
        "edgengram":{
          "type":"ngram",
          "min_gram":2,
          "max_gram":15
        }
      },
      "highlight": {
      "pre_tags" : ["<em>"],
      "post_tags" : ["</em>"],
        "fields": {
          "title.autocomplete": {
            "number_of_fragments": 1,
            "fragment_size": 250
          }
        } 
      }
    }
  }
}

So if in my settings I have the EdgeNGram filter configured how do I add that to the search query?

What I have so far is a match query with highlight:

GET /bigtestindex/doc/_search
{
  "query": {
    "match": {
      "content": {
        "query": "thing and another thing",
        "operator": "and"
      }
    }
  },
  "highlight": {
    "pre_tags" : ["<em>"],
    "post_tags" : ["</em>"],
    "field": {
      "_source.content": {
        "number_of_fragments": 1,
        "fragment_size": 250
      }
    }
  }
}

How would I add autocomplete to the search query using EdgeNGrams configured in the settings for the index?

UPDATE For the mapping, would it be ideal to do something like this:

"title": {
        "type": "string",
        "index_analyzer": "autocomplete",
        "search_analyzer": "standard"
      },

Or do I need to use multi_field type:

"title": {
        "type": "multi_field",
        "fields": {
          "title": {
            "type": "string"
          },
          "autocomplete": {
            "analyzer": "autocomplete",
            "type": "string",
            "index": "not_analyzed"
          }
        }
     },

I'm using ES 1.4.1 and want to use the title field for autocomplete purposes.... ?

1

1 Answers

1
votes

Short answer: you need to use it in a field mapping. As in:

PUT /test_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "stop",
                  "kstem",
                  "ngram"
               ]
            }
         },
         "filter": {
            "edgengram": {
               "type": "ngram",
               "min_gram": 2,
               "max_gram": 15
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "field1": {
               "type": "string",
               "index_analyzer": "autocomplete",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

For a bit more discussion, see:

http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams

and

http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

Also, I don't think you want the "highlight" section in your index definition; that belongs in the query.

EDIT: Upon trying out your code, there are a couple of problems with it. One was the highlight issue I already mentioned. Another is that you named your filter "edgengram", even though it is of type "ngram" rather than type "edgeNGram", but then you referenced the filter "ngram" in your analyzer, which will use the default ngram filter, which probably doesn't give you what you want. (Hint: you can use term vectors to figure out what your analyzer is doing to your documents; you probably want to turn them off in production, though.)

So what you actually want is probably something like this:

PUT /test_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "stop",
                  "kstem",
                  "edgengram_filter"
               ]
            }
         },
         "filter": {
            "edgengram_filter": {
               "type": "edgeNGram",
               "min_gram": 2,
               "max_gram": 15
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "content": {
               "type": "string",
               "index_analyzer": "autocomplete",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

When I indexed these two docs:

POST test_index/doc/_bulk
{"index":{"_id":1}}
{"content":"hello world"}
{"index":{"_id":2}}
{"content":"goodbye world"}

And ran this query (there was an error in your "highlight" block as well; should have said "fields" rather than "field")"

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "content": {
            "query": "good wor",
            "operator": "and"
         }
      }
   },
   "highlight": {
      "pre_tags": [
         "<em>"
      ],
      "post_tags": [
         "</em>"
      ],
      "fields": {
         "content": {
            "number_of_fragments": 1,
            "fragment_size": 250
         }
      }
   }
}

I get back this response, which seems to be what you're looking for, if I understand you correctly:

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2712221,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.2712221,
            "_source": {
               "content": "goodbye world"
            },
            "highlight": {
               "content": [
                  "<em>goodbye</em> <em>world</em>"
               ]
            }
         }
      ]
   }
}

Here is some code I used to test it out:

http://sense.qbox.io/gist/3092992993e0328f7c4ee80e768dd508a0bc053f