We are trying to execute a solr based search on the content of text files and the requirement is trying to return all the hits of the search term in each document along with the highlighted text around the hit.
We are able to return the number of documents found along with the highlighted snippet around the first hit of the search term in the document. But is does not return the list of highlights across the document where the search term is found. We can get the TermFrequency reported as the correct number but not the snippets around all these occurrences.
Relevant portion of the solr schema:
<field name="Content" type="text_general" indexed="false" stored="true" required="true"/>
<field name="ContentSearch" type="text_general" indexed="true" stored="false" multiValued="true"/>
<copyField source="Content" dest="ContentSearch"/>
For example, if we have a.txt and b.pdf which are indexed, and the search term "case" exists in both the documents multiple times(a.txt - 7 hits, b.pdf - 10 hits), when executing a search for "case" against both the documents, we are getting two documents returned with the correct term frequencies(7 and 9) but the highlight list contains only one record which corresponds to the first hit in the files.
Is this something to do with using TermVectorComponent for the content field. I have read but could not quite make out the way the TVC works and in which situation it is helpful.