2
votes

We want to leverage ElasticSearch to find us similar objects.

Lets say I have an Object with 4 fields: product_name, seller_name, seller_phone, platform_id.

Similar products can have different product names and seller names across different platforms (fuzzy match).

While, phone is strict and a single variation might cause yield a wrong record (strict match).

What were trying to create is a query that will:

  1. Take into account all fields we have for current record and OR between them.
  2. Mandate platform_id is the one I want to specific look at. (AND)
  3. Fuzzy the product_name and seller_name
  4. Strictly match the phone number or ignore it in the OR between the fields.

If I would write it in pseudo code, I would write something like:

((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)

1
We're using the Searchkick gem, so any solution either using it or directly querying ES would be great for us :) - Dan Benjamin
i use chewy where i can pass the exact elastic query has hash to chewy github.com/toptal/chewy elastic client. i am not sure how to achieve this with search kick. - user3775217

1 Answers

3
votes

To do exact match on seller_phone i am indexing this field without ngram analyzers along with fuzzy_query for product_name and seller_name

Mapping

PUT index111
{
  "settings": {
    "analysis": {
      "analyzer": {
        "edge_n_gram_analyzer": {
          "tokenizer": "whitespace",
          "filter" : ["lowercase",  "ednge_gram_filter"]
        }
      },
      "filter": {
      "ednge_gram_filter" : {
        "type" : "NGram",
        "min_gram" : 2,
        "max_gram": 10
      }
      }
    }
  },
  "mappings": {
    "document_type" : {
      "properties": {
        "product_name" : {
          "type": "text",
          "analyzer": "edge_n_gram_analyzer"
        },
        "seller_name" : {
          "type": "text",
          "analyzer": "edge_n_gram_analyzer"
        },
        "seller_phone" : {
          "type": "text"
        },
        "platform_id" : {
          "type": "text"
        }
      }
    }
  }
}

Index documents

POST index111/document_type
{
       "product_name":"macbok",
       "seller_name":"apple",
       "seller_phone":"9988",
       "platform_id":"123"
}

For following pseudo sql query

((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)

Elastic Query

POST index111/_search
{
    "query": {
        "bool": {
            "must": [
              {
                "term": {
                  "platform_id": {
                    "value": "123"
                  }
                }
              },
              {
                "bool": {
                    "should": [{
                            "fuzzy": {
                                "product_name": {
                                    "value": "macbouk",
                                    "boost": 1.0,
                                    "fuzziness": 2,
                                    "prefix_length": 0,
                                    "max_expansions": 100
                                }
                            }
                        },
                        {
                            "fuzzy": {
                                "seller_name": {
                                    "value": "apdle",
                                    "boost": 1.0,
                                    "fuzziness": 2,
                                    "prefix_length": 0,
                                    "max_expansions": 100
                                }
                            }
                        },
                        {
                          "term": {
                            "seller_phone": {
                              "value": "9988"
                            }
                          }
                        }
                    ]
                }
            }]
        }
    }
}

Hope this helps