0
votes

I have an Elastic search Index with the following mappings:

{
"index_one": {
    "mappings": {
        "uidMapping": {
            "_all": {
                "enabled": false
            },
            "_source": {
                "enabled": false
            },
            "properties": {
                "age": {
                    "type": "keyword"
                },
                "clean_url": {
                    "type": "keyword",
                    "index": false,
                    "fields": {
                        "hash": {
                            "type": "murmur3"
                        }
                    }
                },
                "gender": {
                    "type": "keyword"
                },
                "segment_aggregate": {
                    "properties": {
                        "segment_name": {
                            "type": "keyword"
                        },
                        "segment_value": {
                            "type": "keyword"
                        }
                    }
                },
                "url_md5": {
                    "type": "keyword",
                    "index": false
                },
                "url_page_views": {
                    "type": "integer",
                    "index": false
                }
            }
        }
    }
}
}

I am trying to run queries with an AND operation on the segment_aggregate fields, ie, queries results should only be returned when both conditions are met. So far, With BoolQueryBuilders, I have tried Match queries and terms queries in Must Clauses, but always seem to get results with an or operation between segment_name and segment_value.

 BoolQueryBuilder queryTest = new BoolQueryBuilder();  
 queryTest.must(QueryBuilders.matchQuery("segment_aggregate.segment_name", 
 "AnyValue").operator(Operator.AND));

queryTest.must(QueryBuilders.matchQuery("segment_aggregate.segment_value", 
"A").operator(Operator.AND));

parentQuery.must(queryTest);

This returns an OR result for the two fields, basically the larger subset. Also Tried:

mustQuery.must(QueryBuilders.termsQuery("segment_aggregate.segment_name", "SegmentName"));
mustQuery.must(QueryBuilders.termsQuery("segment_aggregate.segment_value", "SegmentValue"));

This too doesn't yield desires results. Even when I tried wrapping the subqueries with must clauses in another query, and adding to parent query, this approach too didn't work.

Any ideas as to where I am going wrong?

1

1 Answers

1
votes

The issue you are seeing is probably because you are not marking the segment_aggregate type as nested.

By default, all fields are independently indexed. Even though the JSON structure looks like you are associating specific values inside the inner object in segment_aggregate together, really ES is creating an index of values for segment_aggregate.segment_name and a separate index for segment_aggregate.segment_value.

This means when you do a search like this (assuming query string):

segment_aggregate.segment_name:color AND segment_aggregate.segment_value:green 

what Elasticsearch is really doing is searching for a document where ANY of the values in segment_aggregate.segment_name are set to "color" and ANY of the values in segment_aggregate.segment_value are set to "green". To tell Elasticsearch you want to use the association between the fields in the inner object you need to mark segment_aggregate's type as "nested" instead of the default of "object". Also, you will need to use the special nested query and nested aggregation parts of the query DSL.

More details can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html