How to use wildchards, fuzzy search with Solr?

3

votes

I use Solr for searching in my data and I recognized now that some of the solr search query language feature does not word for me. I miss these from the capabilities I have:

fuzzy search
wildchards * ? - I do not have stemming set up so far, this would be useful temporarily for searching
field specification - currently I cannot tell search in title:Blabla

As far as I know these things should come by default in Solr, but I obviously don't have them. I use Solr 1.4. Here you can find my schema. Thanks for your help.

lucenesolr

5

votes

I googled for "solr fuzzy search" and I found your question here. Actually the version 4.0 of SOLR is capable of fuzzy search with a easy query syntax.

For example you can search for name:peter strict or with the tilde symbol name:peter~ as a fuzzy search. If you desire to restrict the fuzziness a little bit you can add a percentage in form of name:peter~0.7 ... this means you want to search for peter with a "sharpness" of 70%.

4

votes

Your fieldType name="text" is missing a lot of filters. For reference, here's the text fieldType from the default schema.xml:

<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
    words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
    so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
    Synonyms and stopwords are customized by external files, and stemming is enabled.
    -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

For example, the SnowballPorterFilterFactory is the one that enables stemming.

I recommend building your schema based on the default schema.xml, tweaking and modifying as necessary (as opposed to starting from scratch).

Here's the reference for analyzers, tokenizers and filters.

How to use wildchards, fuzzy search with Solr?

2 Answers