2
votes

I am relatively new to elasticsearch and I want to perform a search for products with brand and type names. I already tried a bit but I think I am missing something important to have a solid search algorithm. Here is my approach:

A product looks e.g. like this:

{
  brandName: "Samsung",
  typeName: "PS-50Q7HX",
  ...
}

I will have a single input field. The user can search for a brand/type only or for a brand in combination with a type name. E.g.

Samsung | Samsung PS-50Q7HX | PS-50Q7HX

To eliminate misstyping in the typeName field I use an ngram tokenizer which works great when I search for types only. But in combination with the brandName field I get in trouble. Using something like this does not work well (especially when I use an ngram tokenizer on the brandName field too):

{
  "query" : {
    "multi_match" : {
      "query": "Samsung PS 50Q 7HX",
      "type": "cross_fields", 
      "fields": ["brandName", "typeName"]
    }
  }
}

Of course I know why this is not working well with two ngram tokenizer and a mixed field but I am not sure how to solve this the best way.

I think the main problem is that I do not know if the user entered a brand name or not and I thought about using a second index filled with all available brands, which I use to perform a "pre-search" for an eventually given brand name in my query string. If I find a match I am able to split the search string into type and brand name and perform a more specific search. Like this one

{
  "query": {
    "bool": {
      "must": [
        { "match": { "brandName": "Samsung" } },
        { "match": { "typeName": "PS-50Q7HX" } }
      ]
    }
  }
}

Does this sound like a good approach? Or does anyone see a better way?

Any help is appreciated!

Thank you very much and best regards,

Stefan

1
What do you mean by "does not work well"? Can you show some sample results you're getting and why they are not good enough. It seems your brands and types have pretty distinct lexical structures, so I'm curious what kind of results your getting with what you've already crafted. - Val
Thank you for your answer. When I use an ngram tokenizer for both fields and search with a multi_match the results of e.g. "Samsung SGH" would be "Hama SGH-D500/Z300 Samsung ..." but the Brand is completely wrong because the type matches the brand too. - StefanO

1 Answers

0
votes
  1. To eliminate the typo mistake by the user, you used ngram analyzer which is a costly one. You could use stem analyzer which provide some flexible options to eliminate the typo mistakes

  2. As per my concern, instead of index this in 2 different fields you could index this as a single field.

ex:- "FIELD_NAME": "Samsung|PS-50Q7HX"

Brand name and Product name with some delimiter i used |. analyse this field values with delimiter. so your content data will be index as follows

Samsung

PS-50Q7HX

Then you could search by the following query

{
    "query": {
        "query-string": {
            "query": "Samsung PS-50Q7HX",
            "default_operator": "or",
            "fields": [
                "FIELD_NAME"
            ]
        }
    }
}

this will retrieve the document which has the brand name as samsung or product name as PS-50Q7Hx from index. you could use prefix search and if you use default_operator as and then your search will be most accuracy.