6
votes

If I search a word in a SOLR index I get a document count for documents which contain this word, but if the word is included more times in a document, the total count is still 1 per document.

I need every returned document is counted for the number of times they have the searched word in the field.

I read Word frequency in Solr and SOLR term frequency and I enabled the Term Vector Component, but it does not work.

I configured my field in this way:

<field name="text_text" type="textgen" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

But if I make the following query:

http://localhost:8888/solr/sources/select?q=text_text%3A%22Peter+Pan%22&fl=text_text&wt=json&indent=true&tv.tf

I don't have any count:

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"text_text",
      "tv.tf":"",
      "indent":"true",
      "q":"text_text:\"Peter Pan\"",
      "wt":"json"}},
  "response":{"numFound":12,"start":0,"docs":[
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"},
      {
        "text_text":"Text of the document"}]
  }}

I see a "numFound" value of 12, but the word "Peter Pan" is included 20 times in all 12 documents.

Could you help me to find where I'm wrong, please?

Thank you very much!

2
The parameter tv.tf is present but an empty string could be tested as a boolean false. Try with these parameters in your query tv=true&tv.tf=true. - EricLavault
@Mat : Have you get your answer. I am in same trouble. Will you assist me please? - iNikkz
@iNikkz: sorry, I don't remember where I was using this feature, but I have a vague memory that I did not solve it and I counted the term frequency in another way, not directly from Solr. Sorry. - Mat
@Mat : Ok. thanks. I have solution. Try it. (I) Total term freq => http://localhost:8983/solr/collection1/spell?q=theq&wt=json&indent=true&fl=ttf(term,the) and (II) Term freq => http://localhost:8983/solr/collection1/spell?q=gram:%22ago%22&rows=100&fl=gram,termfreq(gram,ago) - iNikkz

2 Answers

0
votes

I think first off your example won't work because "Peter Pan" is not a word or term - it's a phrase. A good discussion of the challenge of finding phrase frequency is here:

termfreq for a phrase

I would re-try your example with a single word not a phrase and see if it works for you.

0
votes

Try this structure of creating term frequency in the response:

http://localhost:8983/solr/core/select?indent=on&q=solr&fl=field,termfreq("field","term")&wt=json