4
votes

ElasticSearch Version: 0.90.2

Here's the problem: I want to find documents in the index so that they:

  1. match all query tokens across multiple fields
  2. fields own analyzers are used

So if there are 4 documents:

{ "_id" : 1, "name" : "Joe Doe",     "mark" : "1", "message" : "Message First" }
{ "_id" : 2, "name" : "Ann",         "mark" : "3", "message" : "Yesterday Joe Doe got 1 for the message First"}
{ "_id" : 3, "name" : "Joe Doe",     "mark" : "2", "message" : "Message Second" }
{ "_id" : 4, "name" : "Dan Spencer", "mark" : "2", "message" : "Message Third" }

And the query is "Joe First 1" it should find ids 1 and 2. I.e., it should find documents which contain all the tokens from search query, no matter in which fields they are (maybe all tokens are in one field, or maybe each token is in its own field).

One solution would be to use elasticsearch "_all" field functionality: that way it will merge all the fields I need (name, mark, message) into one and I'll be able to query it with something like

"match": {
  "_all": {
    "query": "Joe First 1",
    "operator": "and"
  }
}

But this way I can specify analyzer for the "_all" field only. And I need "name" and "message" fields to have different set of tokenizers/token filters (let's say name will have phonetic analyzer and message will have some stemming token filter).

Is there a way to do this?

3

3 Answers

2
votes

Thanks to guys at elasticsearch group, here's the solution... pretty simple need to say :)

All I needed to do is to use query_string query http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/ with default_operator = AND and it will do the trick:

{
  "query": {
    "query_string": {
      "fields": [
        "name",
        "mark",
        "message"
      ],
      "query": "Joe First 1",
      "default_operator": "AND"
    }
  }
}
0
votes

I think using a multi match query makes sense here. Something like:

"multi_match": {
    "query": "Joe First 1",
    "operator": "and"
    "fields": [ "name", "message", "mark"]
}
0
votes

As you say, you can set the analyzer (or search_analyzer/index_analyzer) to be used on the _all field. It seems to me that should indeed be your first step to achieve the query results you're looking for.

From http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/, we have this tasty quote:

... the _all field copies the text from the other fields and analyzes them again; it doesn’t copy the pre-analyzed tokens. You can set a separate analyzer for the _all field.

Which I interpret to mean that you should set your _all analyzer(s) as well as individual field analyzer(s). The _all field won't re-analyze the individual field data; it will grab the original field contents.