The Dutch and German language do have words that can be combined to new words; compound words.
For example "accountmanager" is considered one word, compounded by the words "account" and "manager". Our users, will use "accountmanager" and "account manager" in documents and queries, and expect the same results for both queries.
To be able to decompound (split) words, solr has a dictionary filter that I have configured in the schema:
<filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="../../compound-word-dictionary.txt" minWordSize="8" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/>
The compound-word-dictionary.txt file holds a list of words that are used to decompound compounded words. In this list you will find for example the words "account" and "manager".
The decompound result is ok, when analyzed in the Solr debugger when searching with query "accountmanager": (term text):
- accountmanager
- account
- manager
This result however, is treated as an OR statement, and finds all documents that have at least one of the terms in it. I want it to behave like an AND statement (so I want only the results that have both the terms "account" and "manager" in the document).
I have tried setting the defaultOperator in the schema to "AND", but this is ignored when using edismax. So I have set the proposed Min-should-Match to 100% (mm=100%), again without any desired result. Tweaking the attributes of the dictionary filter in the schema does not change the behavior to "AND".
Does anybody came across this behavior when using the dictionary compound word token factory and knows a solution to let it behave like an AND statement?