SOLR - how to highlight exact phrases for wildcard searching results

Question

This is my field type declared in schema:

<fieldType name="c_string" class="solr.TextField">
 <analyzer type="index">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
 </analyzer>
</fieldType>

I can search using wildcards without any problems. But I have some problems with highlight feature. Solr highlights entire and not only matched phrase. For example my search query is title:Keyword*. So solr will only display documents matching wilcard. But highlight is:

"title": [
        "<em>Keyword and the rest of title</em>"

but I want:

"title": [
        "<em>Keyword</em> and the rest of title"

This works as I want if I use solr.EdgeNGramFilterFactory like this:

<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
   </analyzer>
</fieldType>

If I use it, highlight is ok, but wildcards are ignored. Solr always searches like with wildcards, title:Keyword title:Keyword* works the same - obviously title:Keyword should not match anything.

Do you have any tips?

[added] Example query:

select?q=text_dsc%3A*dobry*&rows=200&wt=json&indent=true&hl=true&hl.fl=text_dsc&hl.simple.pre=<em>&hl.simple.post=<%2Fem>

Example highlight result:

  "highlighting":{
    "25352":{
      "text_dsc":["<em>14276|\nDzień dobry -  dokument testowy. \n\n \n\nTEST. \n\n\n</em>"]},
    "25353":{
      "text_dsc":["<em>14276|\nDzień dobry -  dokument testowy. \n\n \n\nTEST. \n\n\n</em>"]},
    "26693":{
      "text_dsc":["<em>14276|\nDzień dobry -  dokument testowy. \n\n \n\nTEST. \n\n\n</em>"]}}}

As you can see, query string is dobry, but entire field is highlighted. Why? If I use solr.EdgeNGramFilterFactory as mentioned above, with the same query highlight is correct but searching is incorrect (always wildcard)

Can you please post an example query, especially the highlighting parameters? — lxg
Question updated. Query is generated by solr webadmin interface. — user1209216

Alok Chaudhary Alok Chaudhary · Accepted Answer · 2014-09-15T13:31:44

Use StandardTokenizerFactory and you will get the desired output:

<fieldType name="c_string" class="solr.TextField">
 <analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
 </analyzer>
</fieldType>

The difference between the StandardTokenizerFactory and KeywordTokenizerFactory factory is very well explained in this question: difference between StandardTokenizerFactory and KeywordTokenizerFactory in SoLR

UPDATE

Index text_dsc in two different fields like

   <fieldType name="text_dsc" class="solr.TextField">
 <analyzer type="index">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
 </analyzer>
</fieldType>



<fieldType name="text_dsc_standard" class="solr.TextField">
 <analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
  <filter class="solr.ReversedWildcardFilterFactory" />
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ASCIIFoldingFilterFactory"/>
  <filter class="solr.LowerCaseFilterFactory" />
 </analyzer>
</fieldType>

And in your search query set hl.fl=text_dsc_standard.

SOLR - how to highlight exact phrases for wildcard searching results

1 Answers