ElasticSearch QueryBuilder must_not clause strange behaviour

Question

According to documentation

must_not The clause (query) must not appear in the matching documents.

I have query like this:

// searching for URI which contains smart and doesn't contain vip.vs.csin.cz
BoolQueryBuilder builder = QueryBuilders.boolQuery();
builder.must(QueryBuilders.termQuery(URI, "smart")));
builder.mustNot(QueryBuilders.termQuery(URI, "vip.vs.csin.cz")));

There're two URIs im my elasticsearch repository

1)

/smart-int-vip.vs.csin.cz:5080/smart/api/runtime/case/SC0000000000558648/record/generate/4327/by/SMOBVA002/as/true?espisRecordForm=ANALOG&accountNumber=2318031033/0800

2)

/smart/api/runtime/case/SC0000000000558648/record/generate/4327/by/SMOBVA002/as/true?espisRecordForm=ANALOG&accountNumber=2318031033/0800

When I execute query via ElasticSearchTemplate

elasticsearchTemplate.getClient().search(searchRequest);

I get back 0 records. When I execute same query without mustNot clause I get back 2 records.
In kibana I can write:

uri: "smart" NOT uri: "vip.vs.csin.cz"

And get 1 record as expected.

I was expecting the same behaviour from Java ElasticSearchClient. How can I filter records which contains "vip.vs.csin.cz" from Java and why It filtered second record even though it doesn't contain anything from mustNot clause I specified ?

Edit here's my mapping

@Document(indexName = "audit-2018", type = "audit")
public class Trace {

    @Id
    private String id;
    @Field(type = FieldType.Text)
    private String uri;

    // more columns, getter & setters
}

It we will be much more beneficial if you also share mapping of the index. — gaurav9620

etarhan etarhan · Accepted Answer · 2018-09-16T01:40:49

The Java code you've provided shows a bool query using the must and must_not clauses, wherein you are doing a term query. The thing about term queries is that they are subject to the analyzer you have on your fields, the standard analyzer for text (which is the data type of your uri field, read more here) fields will remove all punctuation (in other words the dots in your word) and split your word up. vip.vs.csin.cz becomes vip vs csin cz. The text field type should be reserved for full-text searches only, in you case I would go for keyword field type (read more here) The reason your Kibana query works as expected is because that one is not actually doing a terms query, but rather a query_string query containing a lucene query: uri: "smart" NOT uri: "vip.vs.csin.cz".

So you have a couple of options to fix your problem. You could change your terms query to match_phrase queries, which would allow you to retain the order of your tokenized terms and probably net the correct result. An alternative would be to do a query_string query instead of a terms query in your Java code, since you have already determined that this does give you the correct result.

My proposed solution would however be to reindex with uri being of field type keyword, since this field type will not result in unwanted tokenization of you field values into multiple terms. You can read more about the default analyzer and tokenizer for the keyword field type here. This would save you headache in the future since you know that your queries are matching your field values exactly "as is".

ElasticSearch QueryBuilder must_not clause strange behaviour

1 Answers