1
votes

I'm very new to Solr so this might be a stupid question.

The requirement is that a query should return results with (intelligent) summary containing highlighted words which match the query text. From what I have read, the highlighted text will be effectively be the document summary. I managed to get highlighting working, however, Solr doesn't provide highlighting for some documents. So my thought process was that if there is a document for which Solr can't provide highlighted text (aka summary), I will ask Solr for: a) general document summary (regardless of what the search term is), or b) top n terms

But I haven't been able to make progress on either of the two items.

The underlying questions is why is Solr not generating highlighted summary for certain documents? I know for a fact that the documents contain the term I'm searching for.

Any insights into this will be much appreciated. Thank you.

Edit1:

Query: /select/?q=agents&start=0&fl=full_path,author,title,content-type,score&hl=true&hl.snippets=5.

The document is a PDF document, the word 'agents' occurs once. Here is the text snippet from the PDF which contains the word. "The Omega 3 & 6 fatty acids (eicosapentaenoic acid) and DHA (docasahexaenoic acid) are constituents of fish oils that act as anti-inflammatory agents. (Usually, these products are sold separately in health food stores as salmon oil or under other names.)"

Edit 2:

The default field (df) is set to be text in solrconfig.xml. I copy all the fields into a field called text which is defined as text_general. Looking at the text_general field type, the only tokenizer specified is solr.StandardTokenizerFactory. I should reiterate here that highlighting does work, the issue is that it doesn't work on some documents. Edit 1 contains information on the document text which I believe should be highlighted with the given query.

2
Very interesting question, can you give abit more details on what documents solr failed to highlight on? what was the query and example summary you prefer etc.Arun
I have edited the question and added the requested information. Thank you.Harinder
What is the tokenizer used over here? Looks like the dot after agents word is creating the problem. Also try adding hl.q parameter to the querysidgate
@sidgate Adding hl.q parameter did not help. I'm not sure how how to find out which tokenizer is being used? As you can see from the query, I'm using the select requestHanlder. Looking at it in solrconfig.xml, it does not contain any mention of highlighting. So I'm assuming it's using the default one.Harinder
what is the default field against which the query is fired? What is the field type? What are the tokenizers defined in schema.xml for that field type?sidgate

2 Answers

0
votes

The Unified Highlighter presents such option, you can pass hl.defaultSummary=true

-1
votes

Make sure all your fields that you expected to have text highlighted have stored="true". Solr can highlight, it needs access to the original text, to access the original text for a field, it needs to be "stored". So your field that is holding mentioned text above needs to be verified. Exmaple: .