1
votes

I have an elasticsearch search implementation working for a webapp but I am stuck on the last detail. I want to be able to filter certain fields alphabetically. So if I query 'd' it should bring back all that begin with 'd' for that field. At the moment this is what I have:

$elasticaQueryString = new Elastica_Query_QueryString();
$elasticaQueryString->setDefaultField('Name');
$elasticaQueryString->setQuery('d'.'*');

It works for fields that have only one work ie 'Dan'. But if there is more than one word then it returns results for each keyword. ie 'Dan Ryan', 'Ryan Dan'. I have also tried a wildcard and prefix query but they give similar results.

Do I need to create a custom analyser or is there some other way around this problem?

1

1 Answers

6
votes

I would tackle this at the mapping level first. A Keyword tokenizer will make your entire field a single token, and then adding a Lowercase filter will lowercase everything...making the field case-insensitive:

"analysis":{
    "analyzer":{
       "analyzer_firstletter":{
          "tokenizer":"keyword",
          "filter":"lowercase"
     }
 }

After inserting some data, this is what the index holds:

$ curl -XGET localhost:9200/test2/tweet/_search -d '{
   "query": {
      "match_all" :{}
    }
  }' | grep title

    "title" : "river dog"
    "title" : "data"
    "title" : "drive"
    "title" : "drunk"
    "title" : "dzone"

Note the entry "river dog", which is what you want to avoid matching. Now, if we use a match_phrase_prefix query, you'll only match those that start with 'd':

 $ curl -XGET localhost:9200/test2/tweet/_search -d '{
    "query": {
       "match_phrase_prefix": {
          "title": {
             "query": "d",
             "max_expansions": 5
          }
        }
      }
    }' | grep title

   "title" : "drive"
   "title" : "drunk"
   "title" : "dzone"
   "title" : "data"

This isn't Elastica specific, but it should be fairly easy to translate over to the appropriate commands. The important part is the keyword + lowercase analyzer, and then using a match_phrase_prefix query.

As a sidenote, wildcards are super slow and best avoided where possible :)