0
votes

Requirement: Search with special characters in a text field.

my Solution so far: Use wildcard query with custom analyzer. I want to use wildcards because it seems the easiest way to do partial searches in a long string with multiple search keys. See ES query below.

I have an index called "invoices" and it has document with one of the fields as

"searchString" : "I000010-1 000010 3901 North Saginaw Road add 2 Midland MI 48640 US MS Dhoni MSD-Company  MSD (777) 777-7777 (333) 333-3333 [email protected] msd-company msdhoni Dhoni, MS (3241480)"

Note: This field acts as the deprecated _all field in ES.

Index Mapping for this field:

"searchString": {"type": "text","analyzer": "multi_level_analyzer"},

Analyzer settings:

PUT invoices

{
  "settings": {
    "analysis": {
      "analyzer": {
        "multi_level_analyzer": {
          "type": "custom", 
          "tokenizer": "whitespace",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

My query looks something like this:

GET invoices/_search

{
    "query": {
        "bool": {
            "must": [{
                    "wildcard": {
                        "searchString": {
                            "value": "msd-company*",
                            "boost": 1.0
                        }
                    }
                },
                {
                    "wildcard": {
                        "searchString": {
                            "value": "Saginaw*",
                            "boost": 1.0
                        }
                    }
                }
            ]
        }
    }
}

My question: Earlier when I was not using a custom analyzer the above query worked BUT I was not able to search for words with special characters like "msd-company".

After attaching the custom analyzer(multi_level_analyzer) the above query fails to return any result. I changed the wildcard query and appended an asterisk before the search key and for some reason it works now. (referred this answer)

I want to know the impact of using "* msd-company*" instead of "msd-company*" in the wildcard query for the text field. How can I still use the wildcard query "msd-company*" with custom analyzer?

Open to suggestions for any other approach to my problem statement.

1

1 Answers

1
votes

I have solved my problem by changing the mapping of the said field to this:

"searchString": {"type": "text","analyzer": "multi_level_analyzer", "search_analyzer": "standard"},

But since wildcard queries are expensive, I would still like to know if there exists a better solution to satisfy my search use case.