Elasticsearch: Nested query under a boolean 'should' not returning results

Question

I'm running the following query (it has been shortened for clarity):

body : {
        query : {
            bool : {
                must : [
                    {
                        match : {
                            active : 1
                            }
                        },
                    ],
                should : [
                      {
                        term : {
                            apply : '2'
                            }
                        },
                      {
                        nested : {
                            path : 'items',
                            query : {
                                terms : {
                                    'items.product' : ["1","2"]
                                    }
                                }
                            }
                        }
                    ],
                minimum_should_match : 1
                }
            }
        }
    };

When I run this query, I don't pull back the documents that match the nested query in the should clause; I only pull back documents matching the first condition. What am I doing wrong? Why can't the terms query not test the field against an array of input items and return results?

When I change the nested query to a match_all or match the items.product field to an exact value, I do get results.

Changing the nested query into the following instead of the current nested query (while everything else stays the same) gives me no results either.

 nested : {
        path : 'items',
        query : {
            bool : {
                must : [
                    {
                        terms : {
                            'items.product' : ["1","2"],            
                             minimum_should_match : 1
                            }
                        },                          
                    ]
                }
            }
         }

Any help would be greatly appreciated - this has been driving me crazy for a couple days now!

Is there anything in your query that is performing any filtering based on score? — rchang
Nope, that's the entire query. That's what I thought at one point as well but it's just not the case. — aamirl
Does the items.product field contain only a single integer (in string representation), or is there other character data in the field as well? — rchang

rchang rchang · Accepted Answer · 2015-02-08T22:07:39

EDITED to include discussion of the index mapping

Given that the terms condition expects a non-analyzed field (per the documentation here), I would recommend you verify that your index has a mapping that specifically makes it so. For instance:

{"mappings" : {
  "your_doc_type" : {
    "items" : {
      "type" : "nested",
      "properties" : {
        "product" : {"type" : "string", "index" : "not_analyzed"},
        ...
        ... Other properties of the nested object
        ...
      }
    },
    ...
    ... Mappings for the other fields in your document type
    ...
  }
}

That should enable the terms to do what they are supposed to do when checking items.product.

My earlier suspicion was that there is something else in your query (min_score perhaps) that is filtering out results based upon score, and that threshold is weeding out the documents that match the items.product condition but not the apply condition due to the underlying Lucene scoring model. In other words, if all other things are equal for documents meeting only one item of the should query, the ones that meet the "apply":"2" condition will score higher than the documents for which items.product is 1 or 2. This was my empirical observation querying a trivially small test set of data with your query.

Test data set:

{"active":1, "apply":"2", "items" : [{"product": "3"}]}
{"active":0, "apply":"2", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "1"}]}
{"active":1, "apply":"3", "items" : [{"product": "2"}]}

Based on the conditions in your query, we should see three documents returned - the first, fourth, and fifth documents.

"hits" : [ {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1rIJ0nixSnh_cG",
  "_score" : 0.731233,
  "_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1sIJ0nixSnh_cK",
  "_score" : 0.4601705,
  "_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1sIJ0nixSnh_cJ",
  "_score" : 0.35959372,
  "_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]

The expected documents came back, but you can see that the first document (for which apply is 2, meeting the first criterion of the should query) scored much higher.

If your intent is for these conditions to not affect the scoring of the documents but to use them instead as simple inclusion/exclusion criteria, you may want to switch to a filtered query. Something like:

{
  "query" : {"filtered" : {
    "query" : {"match_all" : {}},
    "filter" : {"bool" : {
      "must" : [
        {"term" : {"active" : 1}}
      ],
      "should" : [
        {"term" : {"apply" : "2"}},
        {"nested" : {
          "path": "items",
          "query" : {
            "terms" : {"items.product" : ["1", "2"]}
          }
        }}
      ]
    }}
  }}
}

Since you are now specifying a filter instead, these conditions should not impact the scoring of the returned documents but instead only determine whether a document qualifies at all for the result set (the scores are then calculated independently of the conditions above). Using this filtered query, the results from my dumb data set are:

"hits" : [ {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1rIJ0nixSnh_cG",
  "_score" : 1.0,
  "_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1sIJ0nixSnh_cK",
  "_score" : 1.0,
  "_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
  "_index" : "test",
  "_type" : "test",
  "_id" : "AUtrND1sIJ0nixSnh_cJ",
  "_score" : 1.0,
  "_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]

The scores are now identical for all returned documents, without regard for which part of the should was satisfied.

Note that the query property above is match_all - if you had other conditions in your query that are not represented in the original question, then you would need to modify this accordingly.

Elasticsearch: Nested query under a boolean 'should' not returning results

1 Answers