0
votes

I'm currently using elastic search and have several type of queries, among them I use the match_phrase query. The index I'm using this on uses an english analyzer for text messages. When I search for phrases I'm expecting exact results, but if my search term has an english word - like remove - it also marks words like "removed", "removing" "removes".

How do I prevent this with my phrase matching? Is there a better option other than match_phrase for queries like this?

Is this possible without changing the analyzer? Below is my query (structured so it can do other things):

query: {
    fields : ['_id', 'ownerId'],
    from: 0,
    size: 20,
    query: {
        filtered: {
             filter: {
                 and: [group ids]
             },
             query: {
                 bool: {
                     must: {
                         match_phrase: {
                              text: "remove"
                         }
                     }
                  }
             }
        }
    }
}

And here is my index:

[MappingTypes.MESSAGE]: {
    properties: {
      text: {
        type: 'string',
        index: 'analyzed',
        analyzer: 'english',
        term_vector: 'with_positions_offsets'
      },
      ownerId: {
        type: 'string',
        index: 'not_analyzed',
        store: true
      },
      groupId: {
        type: 'string',
        index: 'not_analyzed',
        store: true
      },
      itemId: {
        type: 'string',
        index: 'not_analyzed',
        store: true
      },
      createdAt: {
        type: 'date'
      },
      editedAt: {
        type: 'date'
      },
      type: {
        type: 'string',
        index: 'not_analyzed'
      }
    }
  }
1
why can't you just drop english analyzer in that case?ChintanShah25
Is it possible to control what analyzer is being used from the query alone? I attempt to do things like setting the 'analyzer' to keyword, but it just fails. On another note, I'm also using ES 1.5.Ramzi C.
I've managed to use the keyword - but do I need to have my data indexed as keyword for this to actually work? As it stands, im getting no results.Ramzi C.
You would have to reindex your data, also keyword analyzer wont match "Remove" with "remove" (case sensitive). If you could edit the question with exact requirements, it would be easy to suggest right solutionChintanShah25
I did so, the only real requirement I have is exact matching where I dont get words returned that mean the same thing. Is there no way to adjust fuzziness in a way that prevents the english analyzer from returning words that mean the same thing?Ramzi C.

1 Answers

1
votes

You can use multi-fields to use a field in different ways(one for exact match and one for partial match etc).

You can get rid of stemming with standard analyzer which is also a default analyzer. You could create your index with following mapping

POST test_index
{
  "mappings": {
    "test_type": {
      "properties": {
        "text": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "english",
          "term_vector": "with_positions_offsets",
          "fields": {
            "standard": {
              "type": "string"
            }
          }
        },
        "ownerId": {
          "type": "string",
          "index": "not_analyzed",
          "store": true
        },
        "groupId": {
          "type": "string",
          "index": "not_analyzed",
          "store": true
        },
        "itemId": {
          "type": "string",
          "index": "not_analyzed",
          "store": true
        },
        "createdAt": {
          "type": "date"
        },
        "editedAt": {
          "type": "date"
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

After that whenever you want exact match you need to use text.standard and when you want to perform stemming(want to match removed removes) you could revert to text

You could also update the current mapping but you would have to reindex your data in both cases.

PUT test_index/_mapping/test_type
{
  "properties": {
    "text": {
      "type": "string",
      "index": "analyzed",
      "analyzer": "english",
      "term_vector": "with_positions_offsets",
      "fields": {
        "standard": {
          "type": "string"
        }
      }
    }
  }
}

Does this help?