0
votes

How do I always return the documents with the lowest value in the "url_length" field regardless of (from) that I sent to search?

in the query below, I request the results that have the word (netflix) and that the field (pgrk) is between 9 and 10 and that the field (url_length) is less than 4, but when I put it ("from": 1, "size ": 1) does not return the doc of (_id 15) that has the word (netflix) the field pgrk = 10 and the field (url_length) = 2. Returns the doc of (_id 14) that has the word (netflix) the field pgrk = 10 and the field (url_length) = 3

just return the doc of (_id 15) that has the field (url_length) = 2 if I put it in the query from ZERO ("from": 0, "size": 1)

because I had it searched ("from": 1, "size": 1,) and didn't bring the record of (_id 15) that has the ("url_length" = 2) brought the record of (_id 14) that has the ("url_length" = 3)

{
    "from": 1,
    "size": 1,
    "sort": [
        {
            "pgrk": {
                "order": "desc"
            }
        },
        {
            "url_length": {
                "order": "asc"
            }
        }
    ],
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": "netflix",
                    "type": "cross_fields",
                    "fields": [
                        "tittle",
                        "description",
                        "url"
                    ],
                    "operator": "and"
                }
            },
            "filter": [
                {
                    "range": {
                        "pgrk": {
                            "gte": 9,
                            "lte" : 10
                        }
                    }
                },
                {
                    "range": {
                        "url_length": {
                           "lt" : 4
                        }
                    }
                }
            ]
        }
    }
}

if I put ("from": 1, "size": 1,) it does not return the record (_id 15) that has "url_length = 2" returns the doc of _id 14 that has "url_length = 3" as shown below:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "teste",
        "_type": "_doc",
        "_id": "14",
        "_score": null,
        "_source": {
          "url": "www.333.com",
          "title": "netflix netflix netflix netflix netflix netflix netflix netflix netflix netflix",
          "description": "tudo sobre netflix netflix netflix netflix netflix netflix",
          "pgrk": "10",
          "url_length": "3"
        },
        "sort": [
          10,
          3
        ]
      }
    ]
  }
}

if I put ("from": 0, "size": 1,) then it returns the record (_id 15) that has "url_length = 2"

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "teste",
        "_type": "_doc",
        "_id": "15",
        "_score": null,
        "_source": {
          "url": "www.netflix.yahoo.com",
          "title": "melhor filme",
          "description": "tudo sobre series",
          "pgrk": "10",
          "url_length": "2"
        },
        "sort": [
          10,
          2
        ]
      }
    ]
  }
}

how do I always return the documents with the lowest value in the "url_length" field regardless of (from) that I sent to search?

2
@ Lupanoide What should I change in my mapping? I will post my mapping for you to look atJean
@Lupanoide }, "pgrk": { "type": "integer" }, "url_length": { "type": "integer"Jean
@Lupanoide I verified that the doc (_id 14) has several words (netflix) in the title field in the description field and it seems that this is why (_id 14) was returned instead of (_id 15). Is there any way to disable this punctuation that elasticsearch does when the searched word exists several times in the same doc? because despite the filters I use, it seems that the amount of words found in the doc is giving relevance to the docJean

2 Answers

0
votes

follows my mapping:

{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "0",
      "analysis": {
        "filter": {
          "stemmer_plural_portugues": {
            "name": "minimal_portuguese",
            "stopwords" : ["http", "https", "ftp", "www"],
            "type": "stemmer"
          }
        },
        "analyzer": {
          "analyzer_customizado": {
            "filter": [
              "lowercase",
              "stemmer_plural_portugues",
              "asciifolding"
            ],
            "tokenizer": "lowercase"
          }
        }

      }
    }
  },
  "mappings": {
      "properties": {
        "q": {
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "id": {
         "type": "long"
        },
        "@timestamp": {
          "type": "date"
        },
        "data": {
          "type": "date"
        },
        "@version": {
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"

            }
          }
        },
        "quebrado": {
          "type": "byte"

        },
           "pgrk": {
           "type":  "integer" 
        },

         "url_length": {
           "type":  "integer" 
        },
          "term": {
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "titulo": {
          "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "descricao": {
        "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "url": {
          "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        }
      }
    }
  }
0
votes

I verified that the doc (_id 14) has several words (netflix) in the title field in the description field and it seems that this is why (_id 14) was returned instead of (_id 15).

Is there any way to disable this punctuation that elasticsearch does when the searched word exists several times in the same doc? because despite the filters I use, it seems that the amount of words found in the doc is giving relevance to the doc