Using dismax to search for multiword indexed terms

Question

My solr schema is the following ( only important parts):

<fieldType name="bagofwords_expertfinding" class="solr.TextField"    positionIncrementGap="100">
  <analyzer type="index">
    <!-- remove letters repeated more than two times -->
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="^.*(([aA-zZ])\\2)\\2+.*$" replacement=""/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.LengthFilterFactory" min="3" max="100"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.LengthFilterFactory" min="3" max="100"/>
  </analyzer>
</fieldType>
<fieldType name="namedentities_expertfinding" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <!-- remove letters repeated more than two times -->
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s," replacement=","/>
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern=",\s" replacement=","/>
    <tokenizer class="solr.PatternTokenizerFactory" pattern="," />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/>
  </analyzer>
</fieldType>

In namedentities i have indexed multiword terms like: "diego alberto milito","diego armando maradona". I'm trying to search in both fields boosting them differently with a dismax query.

But trying with this query: localhost:8080/solr/select/?q="diego armando maradona"&defType=dismax&qf=namedentities^100 bagofwords^1&fl=*,score&debugQuery=true&mm=0

solr doesn't find nothing. Maybe i don't understand the correct use of " symbol.

I don't understand also given this from solr wiki:

"In Solr 1.4 and prior, you should basically set mm=0 if you want the equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND. In 3.x and trunk the default value of mm is dictated by the q.op param (q.op=AND => mm=100%; q.op=OR => mm=0%). Keep in mind the default operator is effected by your schema.xml entry. In older versions of Solr the default value is 100% (all clauses must match)"

and given that in my schema the defaultOperator is OR why, without setting mm=0, i obtain a default mm value of 100.

Thanks in advance!

The output of the debug version of the parsed query would also be useful. I suspect that since you tokenize the field, your exact search won't match - as neither of the entries are the string you're searching for when you enclose it in quotes. — MatsLindh
thanks. I've finally discover that the quotes doesn't mean exact match but looking for a phrase: consecutive string so i changed my schema analyzer. But there's not a way to deal with multiword tokens... so i'm putting in the index single words and searching for phrases — Tywnil

Simon Simon · Accepted Answer · 2014-03-24T22:54:17

Having the quotes around the query string above is forcing a phrase query. This means only exact matches are considered. Remove them, replacing with parens, and experiment with the pf and pf2 and pf3 parameters to boost longer matching phrases.

Using dismax to search for multiword indexed terms

1 Answers