47
votes

Two entities: collection and product. The collection is the parent of product.

I need to search by product's terms and show collections with 4 products each.

Collections and products can be partially matched, but the best match first. If the match is not full, some terms have priority.

Example: Searching "color:red" and "material:stone" need to show red stones first, any other red next (this is about collections match and product match).

So, all of this solved by request below:

{
  "query": {
    "has_child": {
      "type": "products",
      "query": {
        "bool": {
          "should": [
            {
              "constant_score": {
                "filter": {
                  "match_all": {}
                },
                "boost": 1
              }
            },
            {
              "constant_score": {
                "filter": {
                  "terms": { "_name": "colors", "colors": [5] }
                },
                "boost": 1.2
              }
            },
            {
              "constant_score": {
                "filter": {
                  "terms": { "_name": "materials", "productTypes": [6] }
                },
                "boost": 1
              }
            }
          ]
        }
      },
      "score_mode": "max",
      "inner_hits": {
        "size": 4,
        "sort": [
          "_score"
        ]
      }
    }
  },
  "sort": [
    "_score"
  ]
}

Ok, now the trouble.

Need to sort by price. As ASC, as DESC. Price is the property of the product.

Need to sort by the price of matched products, so can't move price to the collection. Need to sort by price as a collection as products. Collections sorted by minimal (or maximal) price of matched products.

Need to sort by price only 100% matched products (well, partially matched can be sorted too, but after). I mean, sort must be like ORDER BY _score, price

Example, that I want to get, sort by price asc, [nn] means partially matched product:

Collection1
100 - 200 - 800 - [99]
Collection2
300 - 500 - [10] - [20]
Collection3
400 - 450 - 500 - [100]

I found that sort by the child is not supported. And the suggestion to recalculate score. But I'm using the score for sort by match. My try was

{
  "query": {
    "has_child": {
      "type": "products",
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "should": [
                ... same query as above ...
              ]
            }
          },
          "functions": [
            {
              "script_score": {
                "script": "ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100)",
                "lang": "expression"
              }
            }
          ]
        }
      },
      "score_mode": "max",
      "inner_hits": {
        "size": 4,
        "sort": [
          "_score",
          {
            "price": {
              "order": "desc"
            }
          }
        ]
      }
    }
  },
  "sort": [
    "_score"
  ]
}

But I'm really confused with results to score that I can see in answer. Asking for help :) Or, maybe, drop this and create a nested index?

UPD: Found that was wrong with score. By default, elastic combine score and result of script_score. So score was ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100) * _score - that can broke idea, but easy to fix with boost_mode parameter of function_score. Result query:

{
  "query": {
    "has_child": {
      "type": "products",
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "should": [
                ... same query as above ...
              ]
            }
          },
          "functions": [
            {
              "script_score": {
                "script": "ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value)",
                "lang": "expression"
              }
            }
          ],
          "boost_mode": "replace"
        }
      },
      "score_mode": "max",
      "inner_hits": {
        "size": 4,
        "sort": [
          "_score",
          {
            "price": {
              "order": "desc"
            }
          }
        ]
      }
    }
  },
  "sort": [
    "_score"
  ]
}

boost_mode == 'replace means "use function result as score". Also, used log10 to be sure how many digits in _score. For sort by price DESC need to change formula to ceil((log10(_score)+10) * 100) * 100000 + (doc['price'].value)

UPD2

Formula ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value) returns 100099952 for price 48 and for price 50 (boost == 1, queryNorm == 1) because single precision limitation.

New formula ceil((log10(_score)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value) * 1000)) - reduced number of digits for score and switched from price to lg of price and reduced number of digits too. Feedback welcome.

1
So as the result you will be showing a list of collections. Right?pratikvasa
yes, list of collection with products as inner hitsDmitry MiksIr
We are facing the exact same situation right now. Also there is an added issue of pagination. We came up with this. Don't know whether it's the best. We thought of creating two indexes.. One with the list of products only and each product contains a field collection stating which collection it belongs to. And the second index that is the same as the one you have above. You will have to do 2 searches, the first search will find the distinct collection based on your search criteria and the second search will get the collections that you got from the first search.pratikvasa
This second search will be a very fast one as it is directly searching the collections.. The big disadvantage of this is that two indexes have to be maintained. If you have any better ideas do let me know.pratikvasa
In my case need collection and 4 matched products of each collection. It different from you need. If you don't need products sublist, only collection list, search+group can be fine, just I don't know how sort it.Dmitry MiksIr

1 Answers

1
votes

Thanks for sharing, updated latest formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000)) added +1 to score because in some cases it returns errors like this:

 "function score query returned an invalid score: -Infinity for doc: 4580"

Update: Got another error:

 "function score query returned an invalid score: NaN for doc: 1739"

changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000)) added +1 to doc value to fix this

Update 2: Got another error:

 "function score query returned an invalid score: NaN for doc: 1739"

changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value > 0 ? doc['price'].value : 1) * 1000)) replaced +1 with expression

Update 3: Got another error:

doesn't have an error message anymore it's hard to find now, but it was similar to the previous :(

changed formula to ceil(_score+1) + ceil((doc['price'].value > 0 ? doc['price'].value : 1) * 100) simplified formula, so I can understand and it still working today :)