0
votes

I am trying to provide the search to end user with type as they go which is is more like sqlserver. I was able to implement ES query for the given sql scenario:

select * from table where name like '%pete%' and type != 'xyz and type!='abc'

But the ES query doesnt work for this sql query

select * from table where name like '%peter tom%' and type != 'xyz and type!='abc'

In my elastic search alongwith the wildcard query i also need to perform some boolean filtered query

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [
            {
              "query": {
                "wildcard": {
                  "name": { "value": "*pete*" }
                }
              }
            }
          ],
          "must_not": [
            {
              "match": { "type": "xyz" }
            },
            {
              "match": { "type": "abc" }
            }
          ]
        }
      }
    }
  }
}

The above elastic query with wildcard search works fine and gets me all the documents that matches pete and are not of type xyz and abc .But when i try perform the wildcard with 2 seprate words seprated by space then the same query returns me empty as shown below.For example

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [
            {
              "query": {
                "wildcard": {
                  "name": { "value": "*peter tom*" }
                }
              }
            }
          ],
          "must_not": [
            {
              "match": { "type": "xyz" }
            },
            {
              "match": { "type": "abc" }
            }
          ]
        }
      }
    }
  }
}

My mapping is as follows :

{
  "properties": {
    "name": {
      "type": "string"
    },
    "type": {
      "type": "string"
    }
  }
}

What query should i use in order to make wild card search possible for words seprated by spaces

2
You problem is that you dont understand how ES indexing data. Look to this elastic.co/guide/en/elasticsearch/guide/current/…. Also look to ngGram elastic.co/guide/en/elasticsearch/reference/current/…Vova Bilyachat
So problem is that when you index text "hello world" in ES it will become ["hello", "world"].Vova Bilyachat
i understand how documents are stored in ES but what does ES provide to allow users to perform sql like search for words seprated by spaces is my questionamrit
Do you need like or or you need to find similar? so for instance if user search for "piter tom" is that ok if you show also "peter tom"?Vova Bilyachat
i would like a LIKE operation on both side . i.e like '%peter to%' as we do in sqlamrit

2 Answers

2
votes

The most efficient solution involves leveraging an ngram tokenizer in order to tokenize portions of your name field. For instance, if you have a name like peter tomson, the ngram tokenizer will tokenize and index it like this:

  • pe
  • pet
  • pete
  • peter
  • peter t
  • peter to
  • peter tom
  • peter toms
  • peter tomso
  • eter tomson
  • ter tomson
  • er tomson
  • r tomson
  • tomson
  • tomson
  • omson
  • mson
  • son
  • on

So, when this has been indexed, searching for any of those tokens will retrieve your document with peter thomson in it.

Let's create the index:

PUT likequery
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_ngram_analyzer": {
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "15"
        }
      }
    }
  },
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "search": {
              "type": "string",
              "analyzer": "my_ngram_analyzer"
            }
          }
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

You'll then be able to search like this with a simple and very efficient term query:

POST likequery/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name.search": "peter tom"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "type": "xyz"
          }
        },
        {
          "match": {
            "type": "abc"
          }
        }
      ]
    }
  }
}
1
votes

Well my solution is not perfect and I am not sure about performance. So you should try it on your own risk :)

This is es 5 version

PUT likequery
{
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "type": {
          "type": "string"
        }
      }
    }
  }
}

in ES 2.1 change "type": "keyword" to "type": "string", "index": "not_analyzed"

PUT likequery/typename/1
{
  "name": "peter tomson"
}

PUT likequery/typename/2
{
  "name": "igor tkachenko"
}

PUT likequery/typename/3
{
  "name": "taras shevchenko"
}

Query is case sensetive

POST likequery/_search
{
  "query": {
    "regexp": {
      "name.raw": ".*taras shev.*"
    }
  }
}

Response

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "likequery",
        "_type": "typename",
        "_id": "3",
        "_score": 1,
        "fields": {
          "raw": [
            "taras shevchenko"
          ]
        }
      }
    ]
  }
}

PS. Once again I am not sure about performance of this query since it will use scan and not index.