2
votes

SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId.

I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term "amplifier" for a field. For some reason it only returns.

The things I've tried only return a count for 1 occurrence of the term even though I see the term in the paragraph more than just once.

I've tried faceting on the field, "contents"

http://localhost:8983/solr/select?indent=on&q=:&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count

<lst name="facet_counts">
<lst name="facet_queries">
<int name="amplifier">21</int>
</lst>
<lst name="facet_fields">
<lst name="documentPageId">
<int name="49667.1">1</int>
<int name="49667.10">1</int>
<int name="49667.11">1</int>
<int name="49667.12">1</int>
<int name="49667.13">1</int>
<int name="49667.14">1</int>
<int name="49667.15">1</int>
<int>0</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>

In schema.xml:

In solrconfig.xml:

   <str name="facet.field">filewrapper</str>
   <str name="facet.field">caseNumber</str>
   <str name="facet.field">pageNumber</str>
   <str name="facet.field">documentId</str>
   <str name="facet.field">contents</str>
   <str name="facet.query">documentId</str>
   <str name="facet.query">caseNumber</str>
   <str name="facet.query">pageNumber</str>
  <str name="facet.field">documentPageId</str>
   <str name="facet.query">contents</str>

Thanks in advance,

1

1 Answers

3
votes

You need to use TermVectorsComponent to get term frequency for a give document. Facets won't get you there.

Please read the wiki at TermVectorCompoment.

The option tv.tf will return the term frequency for a given field on per document basis. Make sure the field that you are interested in has the termVectors enabled (termVectors="true" ) .

<field name="pageField" type="text" indexed="true" stored="true" termVectors="true" />

Note: Enabling term vectors will increase the index size & the time required to index. So be wary of this and do benchmark before & after.