1
votes

I wanted to index pdf, word documents using solr. The whole content of the word / pdf document is coming in the search response and also the highlighted fragment. The content is quite long and I wanted avoid it in the search response because of the content length.

Is it possible to get only the highlighted fragment of the content field ?

Here is the search query

http://localhost:8080/solr4x/collection1/select?q=Scripting&wt=xml&hl=true&hl.fl=content

Here is the schema

<field name="content" type="text_general" indexed="false" stored="true"multiValued="true"/>

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

 <copyField source="content" dest="text"/>

I am using solr 4.3

2

2 Answers

1
votes

I would suggest adding &hl.fragsize=100(fragment size) to your query. By default it should be 100, but i am not sure why it is pulling full content for you. Will have to look at your solrconfig.xml for that.

Try changing your search query to:

http://localhost:8080/solr4x/collection1/select?q=Scripting&wt=xml&hl=true&hl.fl=content&hl.fragsize=100

Here is documentation on fragsize: http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize

0
votes

You can specify in your request url which fields you want returned:

http://localhost:8080/solr4x/collection1/select?q=Scripting&wt=xml&hl=true&hl.fl=content&fl=text

SOLR field parameter

Or you could not store the content field (although not sure about the usefulness of a field which is neither stored nor indexed):

<field name="content" type="text_general" indexed="false" stored="false" multiValued="true"/>