4
votes

I'm trying to do a wildcard query with spaces. It easily matches the words on term basis but not on field basis.

I've read the documentation which says that I need to have the field as not_analyzed but with this type set, it returns nothing.

This is the mapping with which it works on term basis:

{
  "denshop" : {
    "mappings" : {
      "products" : {
        "properties" : {
          "code" : {
            "type" : "string"
          },
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "string"
          },
          "price" : {
            "type" : "long"
          },
          "url" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

This is the mapping with which the exact same query returns nothing:

{
  "denshop" : {
    "mappings" : {
      "products" : {
        "properties" : {
          "code" : {
            "type" : "string"
          },
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "price" : {
            "type" : "long"
          },
          "url" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

The query is here:

curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*test*"}}}'

Response with the not_analyzed property:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Response without not_analyzed:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [ {
    ...

EDIT: Adding requested info

Here is the list of documents:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "denshop",
      "_type" : "products",
      "_id" : "3L1",
      "_score" : 1.0,
      "_source" : {
        "id" : 3,
        "name" : "Testovací produkt 2",
        "code" : "",
        "price" : 500,
        "url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-2/"
      }
    }, {
      "_index" : "denshop",
      "_type" : "products",
      "_id" : "4L1",
      "_score" : 1.0,
      "_source" : {
        "id" : 4,
        "name" : "Testovací produkt 3",
        "code" : "",
        "price" : 666,
        "url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-3/"
      }
    }, {
      "_index" : "denshop",
      "_type" : "products",
      "_id" : "2L1",
      "_score" : 1.0,
      "_source" : {
        "id" : 2,
        "name" : "Testovací produkt",
        "code" : "",
        "price" : 500,
        "url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt/"
      }
    }, {
      "_index" : "denshop",
      "_type" : "products",
      "_id" : "5L1",
      "_score" : 1.0,
      "_source" : {
        "id" : 5,
        "name" : "Testovací produkt 4",
        "code" : "",
        "price" : 666,
        "url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-4/"
      }
    }, {
      "_index" : "denshop",
      "_type" : "products",
      "_id" : "6L1",
      "_score" : 1.0,
      "_source" : {
        "id" : 6,
        "name" : "Testovací produkt 5",
        "code" : "",
        "price" : 666,
        "url" : "http://www.denshop.lh/tricka-tilka-tuniky/testovaci-produkt-5/"
      }
    } ]
  }
}

Without the not_analyzed it returns with this:

curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací*"}}}'

But not with this (notice the space before asterisk):

curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací *"}}}'

When I add the not_analyzed to mapping, it returns no hits no matter what I put in the wildcard query.

1
What documents doesn't match and it should? Please, give an example.Andrei Stefan
Updated the question with requested data.Rikudou_Sennin
Your documents have uppercase letters and with not_analyzed they will be indexed like that. When you search for testovaci (meaning lowercase letters) of course it will not match the uppercase Testovaci.Andrei Stefan
Thanks! Is it possible to let it match that field case insensitive? Or I can have only one of the two features?Rikudou_Sennin

1 Answers

5
votes

Add a custom analyzer that should lowercase the text. Then in your search query, before passing the text to it have it lowercased in your client application.

To, also, keep the original analysis chain, I've added a sub-field to your name field that will use the custom analyzer.

PUT /denshop
{
  "settings": {
    "analysis": {
      "analyzer": {
        "keyword_lowercase": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "products": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "lowercase": {
              "type": "string",
              "analyzer": "keyword_lowercase"
            }
          }
        }
      }
    }
  }
}

And the query will work on the sub-field:

GET /denshop/products/_search
{
  "query": {
    "wildcard": {
      "name.lowercase": "*testovací *"
    }
  }
}