1
votes

When we are passing a query containing special characters, Elastic Search is splitting the text. E.g. If we pass "test-test" in query how can we make Elastic Search treat this as a single word and not split it up.

Analyzer used on the field we are searching:

"text_search_filter": {
        "type":     "edge_ngram",
        "min_gram": 1,
        "max_gram": 15
     },
     "standard_stop_filter": {
       "type":       "stop",
       "stopwords":  "_english_"
     }
   },

   "analyzer": {

     "text_search_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
           "lowercase",
           "asciifolding",
           "text_search_filter"
        ]
     }

}

Also the query for search:

"query": {
    "multi_match": {
      "query": "test-test",
      "type": "cross_fields",
      "fields": [
        "FIELD_NAME"
      ],

    }
  }


{
"tokens": [
    {
        "token": "'",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'t",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'te",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'tes",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-t",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-te",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-tes",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-test",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-test'",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    }
]

}

1
what is your use case? what is your mapping?because there are different ways to achieve thisChintanShah25
Updated with Analyzer used and Query for search. We can see a token has been created "test-test" using the analyzer.code_blue
what is the output of curl -XGET 'localhost:9200/your_index_name/_analyze?analyzer=test_search_analyzer' -d 'test-test'ChintanShah25
Do you see a token "test-test" in the output?ChintanShah25
are you sure all fields you are searching against have test_search_analyzer applied? because test-test is one of the tokens and it should match. you are not using different search_analyzer right?ChintanShah25

1 Answers

0
votes

in my code i catch all words which contains "-" and added quotes for it.

example: joe-doe -> "joe-doe"

java code for this:

    static String placeWordsWithDashInQuote(String value) {
    return Arrays.stream(value.split("\\s"))
        .filter(v -> !v.isEmpty())
        .map(v -> v.contains("-") && !v.startsWith("\"") ? "\"" + v + "\"" : v)
        .collect(Collectors.joining(" "));
}

and after this example query looks like:

{
"query": {
    "bool": {
        "must": [
            {
                "query_string": {
                    "fields": [
                        "lastName",
                        "firstName"
                    ],
                    "query": "\"joe-doe\"",
                    "default_operator": "AND"
                }
            }
        ]
    }
},
"sort": [],
"from": 0,
"size": 10 }