Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

Question

I have an analyzer with this configuration,

searchMapping//
        .analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
        .filter(LowerCaseFilterFactory.class)//
        .filter(ASCIIFoldingFilterFactory.class)//
        .filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");

This is how my entity field is configured

@Field(analyzer = @Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;

This is how I create a search query

queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()

I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";

If I tried to search with data bookName = xG then am getting both entities where I would expect entity only with bookName="xGulliver"; Also looked on the query that is produced by hibernate-search.

Executing Lucene query '+(+(+(+( bookName:x bookName:xg bookName:g))))

Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.

I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.

Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?

Hibernate versions that I use are,

hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final

yrodiere yrodiere · Accepted Answer · 2020-02-17T07:53:30

Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.

When you create a keyword query using Hibernate Search, the string passed to that query is analyzed, and if there are multiple tokens, Hibernate Search creates a boolean query with one "should" clause for each token. You can see it here " bookName:x bookName:xg bookName:g": there is no "+" sign before "bookName", which means those are not "must" clauses, they are "should" clauses.

I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.

True, that's annoying.

Is there any other way to override the analyzer in 5.11 version

In 5.11, I don't think there is any other way to override analyzers.

If necessary and if you're using the Lucene backend, I believe you should be able to bypass the Hibernate Search DSL just for this specific query:

Get the analyzer you want: something like Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("myAnalyzerWithoutNGramTokenFilter").
Analyze the search terms: call analyzer.tokenStream(...) and use the TokenStream as appropriate. You'll get a list of tokens.
Create the Lucene Query: essentially it will be a boolean query with one TermQuery for each token.
Pass the resulting Query to Hibernate Search as usual.

or is it handled in some other way in hibernate-search 6.x version in easier way?

It's dead simple in Hibernate Search 6.0.0.Beta4. There are two solutions:

Implicitly: in your mapping, you can specify not only an analyzer (using @FullTextField(analyzer = "myAnalyzer")), but also a "search" analyzer using @FullTextField(analyzer = "myAnalyzer", searchAnalyzer = "mySearchAnalyzer"). The "default" analyzer will be used when indexing, while the "search" analyzer will be used when searching (querying).
Explicitly: at query time, you can override the analyzer on a given predicate by calling .analyzer("mySearchAnalyzer") while building the predicate. There is one example in this section of the documentation.

Note however that dynamic fields are not supported yet in Hibernate Search 6: HSEARCH-3273.

Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

1 Answers