0
votes

i had some id value (numeric and text combination) in my elasticsearch index, and in my program user might will input some special characters in search keyword. and i want to know is there anyway that can let elasticsearch to use exact search and also can remove some special characters in search keywork

i already use custom analyzer to split search keyword by some special characters. and use query->match to search data, and i still got no results

  1. data
{
  "_index": "testdata",
  "_type": "_doc",
  "_id": "11112222",
  "_source": {
    "testid": "1MK444750"
  }
}
  1. custom analyzer
"analysis" : {
  "analyzer" : {
    "testidanalyzer" : {
      "pattern" : """([^\w\d]+|_)""",
      "type" : "pattern"
    }
  }
}
  1. mapping
{
  "article" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "testid" : {
            "type" : "text",
            "analyzer" : "testidanalyzer"
          }
        }
      }
    }
  }
}

here's my elasticsearch query

GET /testdata/_search
{
  "query": {
    "match": {
      // "testid": "1MK_444-750" // no result
      "testid": "1MK444750"
    }
  }
}

and analyzer successfully seprated separated my keyword, but i just can't match anything in result

POST /testdata/_analyze
{
    "analyzer": "testidanalyzer",
    "text": "1MK_444-750"
}

{
  "tokens" : [
    {
      "token" : "1mk",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "444",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "750",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "word",
      "position" : 2
    }
  ]
}

please help, thanks in advance!

1

1 Answers

1
votes

First off, you should probably model the testid field as keyword rather than text, it's a more appropriate data type.

You want to put in a feature whereby some characters (_, -) are effectively ignored at search time. You can achieve this by giving your field a normalizer, which tells Elasticsearch how to preprocess data for this field prior to indexing or searching. Specifically, you can declare a mapping char filter in your normalizer that replaces these characters with an empty string.

This is how all these changes would fit into your mapping:

PUT /testdata
{
  "settings": {
    "analysis": {
      "char_filter": {
        "mycharfilter": {
          "type": "mapping",
          "mappings": [
            "_ => ",
            "- => "
          ]
        }        
      },
      "normalizer": {
        "mynormalizer": {
          "type": "custom",
          "char_filter": [
            "mycharfilter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "testid" : {
          "type" : "keyword",
          "normalizer" : "mynormalizer"
        }
      }
    }
  }
}

The following searches would then produce the same results:

GET /testdata/_search
{
  "query": {
    "match": {
      "testid": "1MK444750"
    }
  }
}

GET /testdata/_search
{
  "query": {
    "match": {
      "testid": "1MK_444-750"
    }
  }
}