1
votes

I just discovered the "more_like_this" query type and tried to used it with my nested objects. Unfortunatelly, it seems this query is not able to search inside nested objects. Here is my mapping :

"Presentation": {
    "properties": {
      "id": {
        "include_in_all": false,
        "type": "string"
      },
      "title": {
        "include_in_all": true,
        "type": "string"
      },
      "description": {
        "include_in_all": true,
        "type": "string"
      },
      "categories": {
        "properties": {
          "id": {
            "include_in_all": false,
            "type": "string"
          },
          "category": {
            "include_in_all": true,
            "type": "string"
          },
          "category_suggest": {
            "properties": {
              "input": {
                "type": "string"
              },
              "payload": {
                "properties": {
                  "id": {
                    "type": "long"
                  }
                }
              }
            }
          }
        },
        "type": "nested"
      }
    }
  }

My goal is to find all related presentations to the id "96", and giving a boost to the one having the same category than the "96". But, when executing the query below, Elasticsearch is only calculating the score on "title" and "description" fields (and not looking at "category").

{
  "size": 4,
  "query": {
    "more_like_this": {
      "like": [
        {
          "_index": "client",
          "_type": "Presentation",
          "_id": "96"
        }
      ],
      "min_term_freq": 1,
      "max_query_terms": 35,
      "min_word_length": 3,
      "minimum_should_match": "1%"
    }
  }
} 

I tried to force the query on the nested field too, but it is not working either :

{
  "size": 4,
  "query": {
    "bool": {
      "should": [
        {
          "more_like_this": {
            "like": [
              {
                "_index": "client",
                "_type": "Presentation",
                "_id": "96"
              }
            ],
            "min_term_freq": 1,
            "max_query_terms": 35,
            "min_word_length": 3,
            "minimum_should_match": "1%"                   
          }
        },
        {
            "nested" : {
                "path":"categories",
                "query" : {
                    "more_like_this": {
                        "like": [
                          {
                            "_index": "client",
                            "_type": "Presentation",
                            "_id": "96"
                          }
                        ],
                        "min_term_freq": 1,
                        "max_query_terms": 35,
                        "min_word_length": 3,
                        "minimum_should_match": "1%"
                    }
                }
            }
        }
      ]
    }
  }
}

I found this guy having the same issue, but with an older version of elasticsearch : ElasticSearch More_Like_This API and Nested Object Properties And, unfortunately, no answer has been given that could work with ES 2.x (except flatten the entire index, that I could'nt do).

Does any one of you has any idea about this (strange) issue ? Thanks :)

3

3 Answers

0
votes

I believe you can specify what fields you want to search over. You could try pointing directly to the nested variables. Something like this

{
  "size": 4,
  "query": {
    "more_like_this": {
      "fields": ["id", "title", "description", "categories.id","categories.description", etc...]
      "like": [
        {
          "_index": "client",
          "_type": "Presentation",
          "_id": "96"
        }
      ],
      "min_term_freq": 1,
      "max_query_terms": 35,
      "min_word_length": 3,
      "minimum_should_match": "1%"
    }
  }
}
0
votes

I'm on ES 5.3 with the same issue (I want MLT to be calculated from the document as well as nested documents).

Your bool should solution was very helpful—I was trying to do the joining inside one MLT query and couldn't figure out how to do so.

I was able to get this to work (or at least it seems to be working fine), by specifying fields within the nested MLT query. So for your case you would add:

"fields": ["categories.*"]

to the nested MLT query. Not sure if this will work with 2.x, but thought it would be mentioning.

0
votes

Try putting "term_vector": "yes" property in your mapping.

As per the documentation,

The fields on which to perform MLT must be indexed and of type string. Additionally, when using like with documents, either _source must be enabled or the fields must be stored or store term_vector. In order to speed up analysis, it could help to store term vectors at index time.