2
votes

I'm new to elasticSearch and need some help...

I've an index called sales having these type of records :

{product.id : 12 , sales.datetime : October 12th 2019},
{product.id : 13 , sales.datetime : October 12th 2019},
{product.id : 14 , sales.datetime : October 13th 2019},
{product.id : 14 , sales.datetime : October 14th 2019},
{product.id : 14 , sales.datetime : October 14th 2019},
{product.id : 13 , sales.datetime : October 18th 2019},
                             ...

I would like to retrieve products unsold from the October 12th 2019,

I tried to filter on the sales.datetime with :

"range" => [
             "datetime" => [
                    "lt" => "October 12th 2019"
                ]
            ] 

But obviously, this type of query will return unexpected values, for exemple with this case, it'll retrieve product.id : [12,13] but has you can see, the product.id has been sell on October 18th 2019...

2

2 Answers

1
votes

Let us take a step back and understand what we are trying to do.

First we want to filter the documents. Secondly we want the filter logic to be dependent on the other document's value, which is nothing but a kind of self-join kinda scenario. I don't think that is possible with the way your documents are ingested. You can read more on join here.

As a simplest solution, in order to achieve what you are looking for, you would need to change the document structure to something like below. Of course you can also choose to use nested type but I think below would be much simpler solution.

Mapping:

PUT sales
{
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "date":{
        "type": "date",
        "format": "MM-dd-yyyy" 
      }
    }
  }
}

Sample Documents:

POST sales/_doc/1
{ 
  "id" : 12 , 
  "date" : ["10-12-2019", "10-18-2019"]       <--- Note this
}

POST sales/_doc/2
{ 
  "id" : 13 , 
  "date" : ["10-12-2019"]
}

POST sales/_doc/3
{ 
  "id" : 14 , 
  "date" : ["10-10-2019", "10-14-2019"]
}

Query:

POST sales/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "date": {
              "lte": "10-12-2019"
            }
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "date": {
              "gt": "10-12-2019"
            }
          }
        }
      ]
    }
  }
}

Note how the query has been constructed in a very simplistic fashion using must and must_not clauses.

Response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "sales",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 13,
          "date" : [
            "10-12-2019"
          ]
        }
      }
    ]
  }
}

Note that in the response, you get only the document you are looking for.

Hope that helps!

1
votes

Mappings:

PUT sales
{
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "date":{
        "type": "date",
        "format": "MM-dd-yyyy" 
      }
    }
  }
}

Data:

 [
      {
        "_index" : "sales",
        "_type" : "_doc",
        "_id" : "I-6Y7W0B_-hMjUaqpQH4",
        "_score" : 1.0,
        "_source" : {
          "id" : 12,
          "date" : "10-12-2019"
        }
      },
      {
        "_index" : "sales",
        "_type" : "_doc",
        "_id" : "JO6Y7W0B_-hMjUaqtwEL",
        "_score" : 1.0,
        "_source" : {
          "id" : 13,
          "date" : "10-12-2019"
        }
      },
      {
        "_index" : "sales",
        "_type" : "_doc",
        "_id" : "Je6Y7W0B_-hMjUaqxgEH",
        "_score" : 1.0,
        "_source" : {
          "id" : 13,
          "date" : "10-18-2019"
        }
      }
    ]

Query:- get max date for all terms, get max date where date range less than 2019-10-12 if both are same return bucket

GET sales/_search
{
  "size": 0,
  "aggs": {
    "transactionId": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "maxDate": {
          "max": {
            "field": "date"
          }
        },
        "pending_status": {
          "filter": {
            "range": {
              "date": {
                "lte": "10-12-2019"
              }
            }
          },
          "aggs": {
            "filtered_maxdate": {
              "max": {
                "field": "date"   
              }
            }
          }
        },
        "buckets_latest_status_pending": {
          "bucket_selector": {
            "buckets_path": {
              "filtereddate": "pending_status>filtered_maxdate",
              "maxDate": "maxDate"
            },
            "script": "params.filtereddate==params.maxDate"
          }
        }
      }
    }
  }
}

Response:

 [
        {
          "key" : "12",
          "doc_count" : 1,
          "pending_status" : {
            "doc_count" : 1,
            "filtered_maxdate" : {
              "value" : 1.5708384E12,
              "value_as_string" : "10-12-2019"
            }
          },
          "maxDate" : {
            "value" : 1.5708384E12,
            "value_as_string" : "10-12-2019"
          }
        }
      ]

EDIT 1:

You can use (top_hits)[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html] aggregation to get all documents in a bucket

GET sales/_search
{
  "size": 0,
  "aggs": {
    "transactionId": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "maxDate": {
          "max": {
            "field": "date"
          }
        },
        "pending_status": {
          "filter": {
            "range": {
              "date": {
                "lte": "10-12-2019"
              }
            }
          },
          "aggs": {
            "filtered_maxdate": {
              "max": {
                "field": "date"   
              }
            }
          }
        },
        "buckets_latest_status_pending": {
          "bucket_selector": {
            "buckets_path": {
              "filtereddate": "pending_status>filtered_maxdate",
              "maxDate": "maxDate"
            },
            "script": "params.filtereddate==params.maxDate"
          }
        },
        "top_hits":{   ---> top hits to get all documents under a bucket
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

For pagination you can use composite aggregation/ include partitions