1
votes

I'm having a trouble understanding the difference between the Bool filter and the And filter in elastic search.

Context: say my documents have fields: X, Y, Z.

Each field can have multiple values.

Goal:

I want to send a query to elastic search in the following sense: (X=valueX1 OR X=valueX2) AND (Y=valueY1 OR Y=valueY2 OR .. ) AND (Z=valueZ1 OR Z=valueZ2 OR ...).

Attempt:

This is how I'm doing that:

BoolFilterBuilder mainClaus = FilterBuilders.boolFilter();
FilterBuilder filterBuilder = mainClaus;

BoolFilterBuilder xClaus = FilterBuilders.boolFilter();
BoolFilterBuilder yClaus = FilterBuilders.boolFilter();
BoolFilterBuilder zClaus = FilterBuilders.boolFilter();

mainClaus.must(xClaus);
mainClaus.must(yClaus);
mainClaus.must(zClaus);

//Return a document if it has at least one of those values.
xClaus.should( FilterBuilders.termFilter("X", "valueX1") );
xClaus.should( FilterBuilders.termFilter("X", "valueX2") );
xClaus.should( FilterBuilders.termFilter("X", "valueX3") );

//Return a document if it has at least one of those values.
yClaus.should( FilterBuilders.termFilter("Y", "valueY1") );
yClaus.should( FilterBuilders.termFilter("Y", "valueY2") );
yClaus.should( FilterBuilders.termFilter("Y", "valueY3") );

//Return a document if it has at least one of those values.
zClaus.should( FilterBuilders.termFilter("Z", "valueZ1") );
zClaus.should( FilterBuilders.termFilter("Z", "valueZ2") );
zClaus.should( FilterBuilders.termFilter("Z", "valueZ3") );

Questions:

  • Is my approach correct?
  • What's the difference between the Bool filter and the And filter?
2

2 Answers

5
votes

The main difference lies in how they are executed. And the keyword here is bitset. Simply put, bool filters leverage bitsets, while and filters do not.

When bool filters are used, bitsets are created and then AND/OR'ed together in order to figure out the matching documents.

When and filters are used, ES simply scans through the list of documents one by one and includes it or not depending on whether it matches the filter or not.

Needless to say that a bool filter is a much faster alternative than a and one. Yet, not always. There are situations where you still want to prefer and over bool: when using geo filters, script filter and numeric range filter, i.e. when those filters are used, ES has to iterate over all documents anyway.

However, all of this only holds for ES pre-2.0, as starting in 2.0, and/or filters will be implemented as bool and the query DSL will be completely overhauled, so that there won't be any difference anymore between queries and filters.

For more in depth info, you can read the nitty gritty details in this great blog post titled: "All about ES filter bitsets"

So what you're doing is OK, but a more concise alternative would be to simply must three terms filter, like this:

BoolFilterBuilder mainClaus = FilterBuilders.boolFilter();
mainClaus.must(FilterBuilders.termsFilter("X", "valueX1", "valueX2", "valueX3"));
mainClaus.must(FilterBuilders.termsFilter("Y", "valueY1", "valueY2", "valueY3"));
mainClaus.must(FilterBuilders.termsFilter("Z", "valueZ1", "valueZ2", "valueZ3"));
0
votes

Instead of using boolean filters here, you should go for a multi-match query. Since you are comparning one variable 'X' with three different values, something like the code below would be a better approach.

String [] params = {'valueX1','valueX3','valueX3'}

queryBuilder =  QueryBuilders.multiMatchQuery('X', params);

This queryBuilder can then be added as a part of a bigger 'must' query where all three variable X, Y and Z can be compared.

You can read more about multimatch queries here. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html