Solr query in a pdf file, is not returning highlighting content

Question

I have implemented solr 6.5.1 today in my debian server but I have trouble getting the pdf text content. The searching is ok, because the document appears ok in when I query for example my name: "juan". However, the does not appear with each str result how it supposed to do.

This is the example query:

http://localhost:8983/solr/ex/select?q=juan&fl=title&wt=xml&hl=true&hl.snippets=20&hl.fl=content&hl.usePhraseHighlighter=true

And this is the result:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="hl.snippets">20</str>
            <str name="q">juan</str>
            <str name="hl">true</str>
            <str name="fl">title</str>
            <str name="hl.usePhraseHighlighter">true</str>
            <str name="hl.fl">content</str>
            <str name="wt">xml</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0">
        <doc>
            <arr name="title">
                <str>CV_Juan_Jara_ultimo</str>
            </arr>
        </doc>
    </result>
    <lst name="highlighting">
        <lst name="/solr-6.5.1/mydocs/CV_Juan_Jara_ultimo.pdf"/>
    </lst>
</response>

Additionally, the log is showing all the pdf text, so I assume it was correctly indexed (I indexed the pdf using the command: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf).

I added the "content" field to the schema, using curl:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field" : {
     "name":"text",
     "type":"text_general",
     "indexed":"true",
     "stored":"false",
     "multiValued":"true"
     }
}' localhost:8983/solr/ex/schema

Do you know what could be wrong ?

All that I want to do is search a topic in my pdf and then get all results highlighted like this:

http://www.codewrecks.com/blog/index.php/2013/05/27/hilight-matched-text-inside-documents-indexed-with-solr-plus-tika/

Alessandro Benedetti Alessandro Benedetti · Accepted Answer · 2017-05-11T16:05:21

It is a very common and simple mistake :

"stored":"false" should be "stored":"true" for the 'content' field.

Currently all the highlighters require the field to be stored to be used [1] .

[1] https://cwiki.apache.org/confluence/display/solr/Highlighting

Solr query in a pdf file, is not returning highlighting content

2 Answers