0
votes

Elasticsearch Mapping

PUT testindex
{
  "settings": {
    "analysis": {
            "filter": {},
            "tokenizer": {
              "my_tokenizer": {
                  "type": "ngram",
                  "min_gram": 3,
                  "max_gram": 3,
                  "token_chars": []
                }
            },
            "analyzer": {
                "my_analyzer": {
                  "tokenizer": "my_tokenizer",
                  "filter": ["lowercase"]
                },
                "hiphen_analyzer": {
                    "tokenizer": "whitespace",
                    "filter": ["lowercase"]
                }
            }
        }
  },
    "mappings": {
      "test": {
        "properties": {
          "catch_all": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "store": true,
                            "ignore_above": 256
                        },
                        "raw": {
                            "type": "text",
                            "store": true,
                            "analyzer": "hiphen_analyzer",
                            "search_analyzer": "whitespace"
                        },
                        "ngrams": {
                          "type": "text",
                          "store": true,
                          "analyzer": "my_analyzer"
                        }
                    }
          },
          "hostname": {
            "type": "text",
            "copy_to": "catch_all"
          }
        }
      }
    }
}

Documents

POST testindex/test
{
"hostname": "server-testing-01"
}
POST testindex/test
{
"hostname": "Dell Poweredge 111"
}

I have server hostnames such as "server-testing-01", "server-testing-02", "Dell Poweredge Server".

Created a mapping in elasticsearch with one field called hostname as "text" and copy_to field "catch_all".

For now only one field "hostname" but other fields will also be copied to catch_all field.

There is a global search box which helps customers search these hostnames and other data.

  1. When searched for "test" Results should have "server-testing-01", "server-testing-02". When searched for "power", results should have "Dell Poweredge Server". When searched for "edge", results should have "Dell Poweredge Server"
  2. When searched for exact "server-testing-01" result should contain only one result.

edit: Currently tried ngram custom analyzer which gives right results for some partial searches not all.

can some body how to achieve the partial search as well as exact search in elasticsearch ?

1

1 Answers

0
votes

The easiest way to achieve the second point, since you've already solved the first point is to wrap your existing query in a boolean query and put the existing query and a new term query in a should clause with minimum_should_match 1. This way it will provide you the second option. If you need a working example you need to provide your mapping, one or two documents as a sample and your query as is this moment.

Your use case is very broad. You could put all the possible analyzers and still miss things. I believe that you don't really need all these analyzers or any complex query. The below is very straightforward (although it needs caution regarding the performance).

PUT testindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "keyword_lowercase": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "hostname": {
          "type": "text",
          "analyzer": "keyword_lowercase"
        }
      }
    }
  }
}

GET testindex/_search
{
  "query": {
    "wildcard": {
      "hostname": {
        "value": "*test*"
      }
    }
  }
}

GET testindex/_search
{
  "query": {
    "wildcard": {
      "hostname": {
        "value": "*dell*power*"
      }
    }
  }
}

GET testindex/_search
{
  "query": {
    "wildcard": {
      "hostname": {
        "value": "*edge*"
      }
    }
  }
}

In general you could use edge-ngrams, but that wouldn't cover the edge example since they start from the beginning. You could use ngrams, but the max 3 is not enough and there would be cases that you would miss. With this approach you cover almost everything. What you need to do on your application level is for a given input you 1. lowercase 2. wrap the input with wildcards

Examples:

  • Dell -> *dell*
  • SERVER -> *server*
  • DELL POWER -> *dell power*

Be careful though you will still miss some cases Example:

  • server testing -> *server testing*

The above won't work. If you need it to work then you can add a wildcard on every whitespace, then the above becomes this

  • server testing -> *server*testing* which will work

This approach will keep you index smaller, but you will be paying a price during search, depending the size of your data and the volume of requests. You can give it a try though.

In general the wildcard query is somewhat nuclear so tread with care. Another approach would be to increase the maximum for you ngrams, but this will grow your index considerably. I don't really know your case so... just see for your own.