5
votes

So in DB I have this entry:

Mark-Whalberg

When searching with term

Mark-Whalberg

I get not match.

Why? Is minus a special character what I understand? It symbolizes "exclude"?

The query is this:

{"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND"}}

Searching everything else, like:

Mark
Whalberg
hlb
Mark Whalberg

returns a match.

Is this stored as two different pieces? How can I get a match when including the minus sign in the search term?

--------------EDIT--------------

This is the current query:

var fields = [
    "field1",
    "field2",
];

{"query_string":{"query": '*Mark-Whalberg*',"default_operator": "AND","fields": fields}};
2
If this is ES 5 and you are using the default mappings, just try a term query on the .keyword subfield: "term": {"field_name.keyword": "Mark-Whalberg"} - Andrei Stefan
Otherwise you need a .keyword subfield :-) to keep the dash sign and the uppercase-lowercase text. - Andrei Stefan
Do you mean like this: {"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND","term": {"field_name.keyword": "Mark-Whalberg"}}} I get error in query. I need to keep query_string because there are different ways to search. - oderfla
What do you mean several fields? You need to be more explicit in your question. It's hard to help you when bits of information are not provided (your index mapping, sample data, desired output). - Andrei Stefan
Well, you are missing the mapping ;-), but never mind. I see @Mickaël-B is shortly explaining how the analyzers story goes in ES. - Andrei Stefan

2 Answers

13
votes

You have an analyzer configuration issue.

Let me explain that. When you defined your index in ElasticSearch, you didn't indicate any analyzer for the field. It means it's the Standard Analyzer that will apply.

According to the documentation :

Standard Analyzer

The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29) and works well for most languages.

Also, to answer to your question :

Why? Is minus a special character what I understand? It symbolizes "exclude"?

For the Standard Analyzer, yes it is. It doesn't mean "exclude" but it is a special char that will be deleted after analysis.

From documentation :

Why doesn’t the term query match my document?

[...] There are many ways to analyze text: the default standard analyzer drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. [...]

Example :

If you have the following text :

"The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."

Then the Standard Analyzer will produce :

[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

If you don't want to use the analyzer you have 2 solutions :

  • You can use match query.
  • You can ask ElasticSearch to not analyze the field when you create your index : here's how

I hope this will help you.

0
votes

I've stuck in same question and the answer from @Mickael was perfect to understand what is going on (I really recommend you to read the linked documentation).

I solve this by defining an operator to the query:

GET http://localhost:9200/creative/_search

{  
  "query": {
    "match": {
      "keyword_id": {
        "query": "fake-keyword-uuid-3",
        "operator": "AND"
       }
    }
  }
}

For better understand the algorithm that this query uses, try to add "explain": true and analyse the results:

GET http://localhost:9200/creative/_search

{  
  "explain": true,
  "query": // ...
}