edismax with multiple words for keyword tokenizer splitting on space

Question

There are two fields in my schema:
field1 is using keyword tokenizer filter that preserves the tokens as it is (not even dividing on space. I double checked that in analysis tab.)
field2 is using WhitespaceTokenizerFactory that breaks the words on spaces and tabs etc.

<field name="field1" type="field1_type" indexed="true" stored="false"/>
<field name="field2" type="field2_type" indexed="true" stored="false"/>
<fieldType name="field2_type" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> </analyzer> </fieldType>

I am using edismax parser with default qf value= field1 field2
Now when I'm querying with q=hello world
In deugging mode its showing that its making query like

rawquerystring:hello world

querystring:hello world parsedquery:(+((DisjunctionMaxQuery((field1:hello | field2:hello)) DisjunctionMaxQuery((field1:world | field2:world)))~1) ())/no_coord

parsedquery_toString:+(((field1:hello | field2:hello) (field1:world | field2:world))~1) ()

What I expected was something like this:

expected:+(((field1:hello world) ((field2:hello) (field2:world))~1) ()

i.e. for field1 it should not break the query on space as it is using keyword tokenizer while it should break the query on space for field2. Can you please tell what am I doing wrong?

Jack Krupansky Jack Krupansky · Accepted Answer · 2015-01-06T11:48:07

You need to escape the space in your query (using backslash or quotes around the term) - the query parser doesn't parse based on the analyzer/tokenizer for each field.

edismax with multiple words for keyword tokenizer splitting on space

1 Answers